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@ Scalable encoding and decoding of high-resolution progressive video. 

(57) High-resolution, progressive format video sig- 
nals (on C3100) having high frame rates may be 
encoded by a base layer encoder (C3140) and 
an enhancement layer encoder (C3180) to pro- 
vide two kinds of encoded video signals which 
share a common output channel (C3260). These 
encoded video signals are received at an input 
(C3270) of a video receiver which may use one 
or both of the two kinds of encoded video 
signal. Relatively tower performance high- 
definition televisions may thus receive video 
signals from higher performance transmitters 
and produce satisfactory pictures. Higher per- 
formance HDTVs will be able to utilize the full 
performance capabilities of these video signals. 
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Technical Field 

This invention relates to encoding and decoding 
of video signals. More particularly, this invention re- 
lates to multi-level encoding and decoding of high- 
resolution video signals, such as high-definition tele- 
vision (HDTV) signals. 

Background 



high-resolution, high- frame rate, progressive format 
video signals. A base layer encoder is responsive to 
the input for producing in a predetermined format en- 
coded video signals having a predetermined resolu- 
5 tion and a predetermined frame rate. An enhance- 
ment layer encoder is also responsive to the input for 
producing encoded video input signals at a predeter- 
mined frame rate in a predetermined format at a pre- 
determined resolution level. 
10 Other aspects of certain examples of this inven- 

tion include achieving migration to a high frame rate, 
such as 60 Hz, progressive format in a manner com- 
patible with HDTVs having lessercapabilities, various 
codec configurations representing complexity and 
cost tradeoffs, adaptive channel sharing for better 
overall picture quality for a given bandwidth, and 
adaptive progressive to interlace conversion to main- 
tain resolution and prevent aliasing. 

The discussion in this Summary and the following 
Brief Description of the Drawings, Detailed Descrip- 
tion, and drawings merely represents examples of 
this invention and is not to be considered in any way 
a limitation on the scope of the exclusionary rights 
conferred by a patent which may issue from this ap- 
plication. The scope of such exclusionary rights is set 
forth in the claims at the end of this application. 

Brief Description of the Drawings 

Fig. C3 is a block diagram of a two layer video co- 
dec with base layer progressive, enhancement layer 
progressive, and adaptive channel sharing, arranged 
in accordance with the principles of this invention. 

Fig. C4 is a block diagram of a two layer video co- 
dec with base layer progressive, enhancement layer 
progressive, and adaptive channel sharing, arranged 
in accordance with the principles of this invention- 
Fig. C5 is a block diagram of a two layer video co- 
dec with base layer interlaced, enhancement layer 
progressive, and adaptive channel sharing, arranged 
in accordance with the invention. 

Fig. C5a is a block diagram of a progressive to in- 
terlaced decimator. 

Fig. C5b is a block diagram of a spatio-temporal 
low-pass filter. 

Fig. C5c is a block diagram of a motion adaptive 
spatio-temporal low-pass filter. 

Fig. C6 is a block diagram of a two layer video co- 
dec with base layer interlaced, enhancement layer in- 
terlaced, and adaptive channel sharing, arranged in 
accordance with the principles of this invention. 

Fig. C6a is a block diagram of a progressive to 
two-interlaced converter. 

Fig. C6b is a block diagram of a two-interlaced to 
progressive converter. 

Fig. G12A is picture structure for a base layer 
with M=3 and an enhancement layer with I- pictures 
spatial prediction, arranged in accordance with the 



The Federal Communications Commission 
(FCC) now has before it several different proposals 
regarding standardization of high-definition television 
(HDTV) systems. These proposals envision a short- 
term solution to the problem of standardization involv- 15 
ing both interlaced format video and reduced resolu- 
tion progressively scanned video. For example, these 
proposals involve interlaced format video comprising 
a full 1 ,050 horizontal lines of picture elements (pels) 
per frame (960 active lines), progressive format video 20 
comprising 787.5 horizontal lines of pels per frame 
(720 active lines) at frame rates of 60 Hz, 30 Hz, or 
24 Hz, and progressive format video comprising 
1,050 horizontal lines of pels per frame (960 active 
lines) at a frame rate of 30 Hz or 24 Hz. At some un- 25 
specified time in the future, HDTV standards are ex- 
pected to migrate to a full resolution progressively 
scanned format For example, it is expected that 
HDTV standards may evolve to a progressive format 
video comprising a full 1 ,050 horizontal lines of pels 30 
at a full 60 Hz frame rate. In the meantime, it is ex- 
pected that a large amount of expensive, relatively 
low capability HDTV equipment will be made and sold 
which will use the early low-capability standards. It 
will be necessary for any high-capability future HDTV 35 
standards to permit reception of reasonable pictures 
on these earlier low-capability systems. 

Summary 

40 

During the transition period from the short-term 
systems to the later higher performance implementa- 
tions, it will be necessary for older HDTV sets to re- 
ceive the new signals and display reasonably good- 
looking pictures from those signals. An advantageous 45 
way to provide for this capability is through a techni- 
que of scalable coding of high resolution progressive 
format video signals whereby a base layer of coding 
and an enhancement layer of coding are combined to 
form a new encoded video signal. Older HDTVs will so 
discard the enhancement layer and derive their pic- 
tures only from the base layer, which is made compat- 
ible with the initial short-term FCC standards. Newer 
HDTVs will be able to utilize both of the base layer 
and the enhancement layer to derive high-resolution, 55 
high-frame rate, and progressive format pictures. 

In a specific example of this invention, a multi- 
layer video encoder comprises an input for receiving 
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principles of this invention. 

Fig. G12B is a picture structure for a base layer 
with M=3, and an enhancement layer with I- pictures 
and unidirectional prediction from the base layer, ar- 
ranged in accordance with the principles of this inven- 5 
tion. 

Fig. G12D is a picture structure for a base layer 
with M=3, and an enhancement layer with M=1 and 
unidirectional prediction from the base layer, ar- 
ranged in accordance with the principles of this inven- 10 
tion. 

Fig. G12E is a picture structure for a base layer 
with M=3. and an enhancement layer with I- pictures 
and unidirectional prediction from the base layer, ar- 
ranged in accordance with the principles of this inven- 15 
tion. 

Fig. G12G is a picture structure for a base layer 
with M=3, and an enhancement layer with M=1 and bi- 
directional prediction from the base layer, arranged in 
accordance with the principles of this invention. 20 

Fig. G12H is a picture structure for a base layer 
with M=3, and an enhancement layer with M=3 and 
unidirectional prediction from the base layer, ar- 
ranged in accordance with the principles of this inven- 
tion. 25 

Fig. G12I is a picture structure for a base layer 
with M=3, and an enhancement layer with I- pictures 
spatial prediction at twice the picture rate of base lay- 
er, arranged in accordance with the principles of this 
invention. 30 

Fig. G12J is a picture structure for a base layer 
with M=3, and an enhancement layer with M=1 and 
unidirectional prediction from the base layer at twice 
the picture rate of base layer, arranged in accordance 
with the principles of this invention. 35 

Fig. E1 2A is a the block diagram of a two layer en- 
coder for a base layer with M=3, and an enhancement 
layer with l-pictures spatial prediction, arranged in ac- 
cordance with the principles of this invention. 

Fig. E12B Is a block diagram of a two layer encod- 40 
erfor a base layer with M=3, and an enhancement lay- 
er with l-pictures and unidirectional prediction from 
the base layer, arranged in accordance with the prin- 
ciples of this invention. 

Fig. E12D is a block diagram of a two layer encod- 45 
erfor a base layer with M=3, and an enhancement lay- 
er with M=1 and unidirectional prediction from the 
base layer, arranged in accordance with the principles 
of this invention. 

Fig. E12E is a block diagram of a two layer encod- so 
erfor a base layer with M=3 f and an enhancement lay- 
er with l-pictures and unidirectional prediction from 
the base layer, arranged in accordance with the prin- 
ciples of this invention. 

Fig. E12G is a block diagram of a two layer en- ss 
coder for a base layer with M=3, and an enhancement 
layer with M=1 and bi-directional prediction from base 
layer, arranged in accordance with the principles of 



this invention. 

Fig. E12H is a block diagram of a two layer en- 
coder for a base layer with M=3, and an enhancement 
layer with M=3 and unidirectional prediction from 
base layer, arranged in accordance with the principles 
of this invention. 

Fig. E12I is a block diagram of a two layer encod- 
er for a base layer with M=3, and an enhancement lay- 
er with l-pictures spatial prediction at twice the pic- 
ture rate of the base layer, arranged in accordance 
with the principles of this invention. 

Fig. E1 2J is a block diagram of a two layer encod- 
erfora base layer with M=3, and an enhancement lay- 
er with M=1 and unidirectional prediction from the 
base layer at twice the picture rate of base layer, ar- 
ranged in accordance with the principles of this inven- 
tion. 

Fig. D12A is a block diagram of a two layer decod- 
er for base layerwith M=3, and an enhancement layer 
with l-pictures spatial prediction, arranged in accor- 
dance with the principles of this invention. 

Fig. D12B is a block diagram of a two layer de- 
coder for base layerwith M=3, and an enhancement 
layer with l-pictures and unidirectional prediction 
from the base layer, arranged in accordance with the 
principles of this invention. 

Fig. D12D is a block diagram of a two layer de- 
coder for a base layerwith M=3, and an enhancement 
layerwith M=1 and unidirectional prediction from the 
base layer, arranged in accordance with the principles 
of this invention. 

Fig. D12E is a block diagram of a two layer de- 
coder for a base layerwith M=3, and an enhancement 
layer with l-pictures and unidirectional prediction 
from the base layer, arranged In accordance with the 
principles of this invention. 

Fig. D12G is a block diagram of a two layer de- 
coder for a base layerwith M=3, and an enhancement 
layer with M=1 and bi-directional prediction from the 
base layer, arranged in accordance with the principles 
of this invention. 

Fig. D12H is a block diagram of a two layer de- 
coder for a base layerwith M=3, and an enhancement 
layer with M=3 and unidirectional prediction from the 
base layer, arranged in accordance with the principles 
of this invention. 

Fig. D 121 is a block diagram of a two layer decod- 
er for a base layer with M=3, and an enhancement lay- 
er with l-pictures spatial prediction at twice the pic- 
ture rate of the base layer, arranged In accordance 
with the principles of this invention. 

Fig. D12J is a block diagram of a two layer decod- 
er for a base layerwith M=3, and an enhancement lay- 
er with M=1 and unidirectional prediction from the 
base layer at twice the picture rate of base layer, ar- 
ranged in accordance with the principles of this inven- 
tion. 

Fig. ED shows an example of the frame reorgan- 
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izer block "ORG" of the encoder and decoder block di- 
agrams mentioned above. 

Fig. F2 shows progressive to interlace conver- 
sion, in accordance with this invention. 

Fig. F3 shows the operation of decimation for pro- s 
gressive to interlace conversion, arranged in accor- 
dance with this invention. 

Fig. F4 shows the operation of interpolation for in- 
terlace to progressive conversion, arranged in accor- 
dance with this invention. 10 

Fig. F5 shows the operation of progressive to 
two-interlace conversion, arranged in accordance 
with this invention. 

Fig. F6 shows the operation of two-interlace to 
progressive conversion, arranged in accordance with 15 
this invention. 

The following abbreviations have been used in 
the drawings listed above: 

prog - progressive video 

inter! - interlaced video 20 
MC - Motion Compensation 
ME - Motion Estimation 
mv - motion vectors 

T- Transform (e.g. a Discrete Cosine Trans- 
form [DCT]) 25 

IT- Inverse Transform (e.g. an Inverse Discrete 
Cosine 

Transform [IDCT]) 
Q - Quantizer 

IQ - Inverse Quantizer 30 

OA - Quantizer Adapter 

qs - quantizer step size 

VE - Variable Length Encoder 

VD - Variable Length Decoder 

WT - Welghter (generalized switch) 35 

PS - Previous Picture Store 

NS - Next Picture Store 

XS - Extra Picture Store 

YS - Extra Picture Store 

SW- Switch 40 
BF - Buffer 

Detailed Description 

Fig. C3 shows an Illustrative spatially scalable 45 
system in accordance with this invention using a base 
layer consisting of lower resolution, progressive tele- 
vision. In this scenario, current FCC plans call for a 
base layer active video of 1280 pels, 720 scan lines, 
60 Hz frame rate, 1:1 progressive. The video Input is so 
at the same frame rate, but has a higher resolution. 
For example, one plan calls for up to 1920 pels, 960 
lines, 60 Hz, 1:1 progressive. Another plan calls for up 
to 1920 pels, 1080 lines, 60 Hz, 1:1 progressive. 

Such progressive high resolution video enters the 55 
circu it of C3 on bus c3 1 00 a nd passes to a spatial d eo 
i ma tor c3120, where it may be low-pass filtered be- 
fore reducing the number of pels to a lower base-layer 



resolution. The decimated base layer video is then 
output on bus c3130 and passes to a base encoder 
c3140, which outputs a typically variable bit-rate cod- 
ed bit-stream on bus c3230. 

Base encoder c3140 also outputs a replica de- 
coded base layer video signal on bus c3150, which 
passes to a spatial interpolator c3160. Spatial inter- 
polator c31 60 increases the number of pels per frame 
using any interpolation method well known in the art. 
This "upsampled" video is output on bus c3170 and 
passes to an enhancement encoder c3180, which 
outputs a typically variable bit-rate coded bit-stream 
on bus c3250. 

Enhancement encoder c3180 utilizes the upsam- 
pled video on bus c3170 as a prediction, in order to 
increase the efficiency of coding the full resolution 
progressive video input on bus c3100. An example of 
such encoding is described below. 

The two variable-rate bit-streams on buses 
C3230 and c3220 pass to buffers C3190 and c3210, 
respectively. Typically, bits are read out of the buffers 
at a different instantaneous rate than 4 bits are writ- 
ten into the buffers. Because of this, there is the pos- 
sibility that overflow or underflow might occur. To al- 
leviate this possibility buffer c31 90 outputs a fullness 
signal on bus c3200, and buffer c3210 outputs a full- 
ness signal on bus c3225. 

The fullness of buffer c3190 appearing on bus 
c3200 passes to both the base encoder c3140 and 
the enhancement encoder c3180. Base encoder 
c3140 utilizes this fullness signal to control the data 
flow into Buffer c3190 according to any method of 
controlling data flow well known in the art 

The fullness signal from buffer c3210 appearing 
on bus c3225 passes to enhancement encoder 
c3160. In many scalable implementations, the coded 
picture quality of the base layer will be the overriding 
consideration in allocating bit-rate to the enhance- 
ment layer. In fact, the coding efficiency of the en- 
hancement encoder usually depends on a high qual- 
ity base layer picture. For these reasons, enhance- 
ment encoder c31 80 utilizes both buffer fullness sig- 
nals in controlling the data flow into buffer c3210. For 
example, it may utilize the sum of the two fullnesses. 
Also, for example, if at any time Buffer c3190 were 
deemed too full, then enhancement encoder c3180 
could cease producing data altogether for the en- 
hancement layer, thereby allocating the entire trans- 
mission bit-rate to the base layer. 

Data Is read out of Buffers C3190 and c3210 on 
buses c3240 and c3250, respectively, under control 
of Systems Multiplexer c3250, which typically com- 
bines the two bit-streams in preparation for transmis- 
sion on Channel c3260. Alternatively, the two bit-stre- 
ams could be sent on two separate and Independent 
channels. 

If the two bit-streams are multiplexed, then sys- 
tems demultiplexer c3270 at the receiver separates 
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them apart again and outputs them on buses c3290 
and c3280. In the absence of transmission errors, the 
bit-stream on bus c3250 appears on bus c3290 and 
the bit-stream on bus c3240 appears on bus c3280. 

The two bit-streams on buses c3290 and C3280 
enter an enhancement decoder c3340 and a base de- 
coder c3300, respectively. Base decoder c3300 pro- 
duces a base layer video signal on bus c3310, which, 
in the absence of transmission errors, is exactly the 
same as the replica decoded video on bus c3150. 

The decoded base layer video on bus c3310 also 
passes to spatial interpolator c3320, which is a dupli- 
cate of the interpolator c3150 and which produces an 
upsampled video on bus c3330. In the absence of 
transmission errors, the upsampled video on buses 
c3330 and c3170 are identical. Enhancement decod- 
er c3340 utilizes the upsampled video on bus c3330 
in conjunction with the enhancement layer bit-stream 
on bus c3290 to produce a decoded full resolution, 
progressive video on bus C3350. A detailed example 
of such decoding is described below. 

Fig. C4 shows another version of the invention 
where the base layer comprises a progressive video 
signal at full resolution, but at half the frame rate of 
the original. Current FCC plans call for a "film" mode 
of perhaps up to 1920 pels, 1080 lines, 30 Hz frame 
rate, 1:1 progressive. The system of Fig. C4 compris- 
es a temporal demultiplexer c4130 which is a simple 
switching mechanism that routes alternate frames of 
progressive input video to output buses c4120 and 
C4110. 

Abase encoder c41 40 may operate in essentially 
the same way as the base encoder C3140 in Fig. c3, 
except that it codes full resolution video at half the 
frame rate. A replica decoded base layer video Is out- 
put on bus c4180 that is full resolution. Thus, there is 
no need for upsampling prior to delivery to an en- 
hancement encoder c4160. 

The enhancement encoder c41 60 operates in ex- 
actly the same way as the enhancement encoder 
C3180 in Fig. c3. However, in this case the prediction 
picture on bus c4150 is temporally shifted from the 
video frames on bus c4120 that are to be encoded. 
For this reason a simple coding of the difference is 
not the most efficient method. An example, encoding 
is described below. 

The remaining operations of the encoding, multi- 
plexing, demultiplexing and base decoding are iden- 
tical to those of Fig. C3. In the absence of transmis- 
sion errors, the decoded base layer video on buses 
C4300, c4310 and c4320 is identical to the replica de- 
coded video on bus c4150. 

An enhancement decoder c4290 produces a full 
resolution, half frame-rate video on bus c4340. These 
frames occur temporally at times half way between 
the frame times of the base layer video on bus c4320. 
The details of an example of such decoding is descri- 
bed below. 



A temporal multiplexor c4330 may comprise a 
simple switching mechanism that alternately feeds 
the frames on busses c4320 and c4340 to the output 
bus c4350 to provide a full resolution, full frame rate, 
5 progressive video. 

Fig. C5 shows another example of the invention 
where the base layer comprises an interlaced signal 
at full resolution, but at half the frame rate of the orig- 
inal progressive input. Current FCC plans call for an 
10 interlace mode of perhaps up to 1920 pels, 1080 
lines, 30 Hz frame rate, 2:1 interlaced. Progress Ive- 
to-interlace-decimator C5110 converts each pair of 
progressive frames on bus c5100 to a single inter- 
laced frame and outputs the result on bus c51 20. Ap- 
is paratus and methods for converting progressive for- 
mat video to interlace format video are described be- 
low. 

A base encoder c5130 operates in exactly the 
same way as the base encoder C3140 in Fig. C3 V ex- 
20 cept that it codes full resolution interlaced video at 
half the frame rate. A replica decoded base layer vid- 
eo is output on bus c5140 that is full resolution, Inter- 
laced. 

Inter lace-to- progressive interpolator c5150 con- 

25 verts each replica decoded interlaced frame input 
from bus c5140 into two progressive frames in a man- 
ner to be described below. The resulting upsampled 
progressive video is output on bus c5160 and fed to 
an enhancement encoder c5170. Enhancement en- 

30 coder c51 70 operates in exactly the same way as the 
enhancement encoder C3180 in Fig. C3. 

The remaining operations of encoding, multiplex- 
ing, demultiplexing and base decoding are identical to 
those corresponding operations of Fig. C3. In the ab- 

35 sence of transmission errors, the decoded base layer 
video on buses c5320 and c5330 is identical to the 
replica decoded video on bus c5 140. Interlace-to-pro- 
gressive interpolator C5340, which is identical to that 
of element c51 50, upsamples the interlaced video on 

40 bus C5330 and produces on bus c5350 a full resolu- 
tion, full frame rate, progressive video that in the ab- 
sence of transmission errors, is identical to the video 
on bus C5160. 

Enhancement decoder c5360 utilizes the upsam- 

45 pled video on bus c5350 In conjunction with the en- 
hancement layer bit-stream on bus c5290 to produce 
a decoded full resolution, full frame rate, progressive 
video on bus c5370. An example of such decoding is 
described in detail below. 

so In converting from a progressive scanned televi- 

sion signal containing, for example, 60 complete 
frames per second to an interlaced signal containing 
30 frames per second, a fixed spatial-low-pass filter 
may be used on each progressive frame, as shown in 

55 Fig. C5a. Following the filtering operation of a filter 
C5a11 0. each pair of progressive frames is converted 
to an interlaced frame by taking the odd numbered TV 
lines from the first frame of the progressive pair and 
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the even numbered TV lines from the second frame 
of the progressive pair. This is accomplished by a sub- 
sampler C5a120, which discards alternate TV lines of 
each progressive frame. Following the subsampling 
operation, a line Buffer C5a 1 30 serves to stretch the 5 
duration of each retained scan TV line by a factor of 
two to meet the timing requirements of the resulting 
interlaced video signal, which is output on bus 
C5a140. Although the buffer c5a140 is shown in Fig. 
c5a as a separate item, it may be incorporated into the 10 
operation of the base encoder that follows. 

The line subsampling operation is shown graphi- 
cally in Fig. F2. The spatial filtering is normally em- 
ployed only in the vertical dimension of each video 
frame. The combined f iltering/subsampling operation 15 
is shown graphically in Fig. F3. Here, an example 11- 
tap vertical filter is applied to each progressive video 
frame. The first filtered frame of each progressive 
pair is subsampled vertically to form the first field of 
the interlaced video frame. Similarly, the second 20 
frame of each progressive pair becomes the second 
field of the interlaced frame. Also, shown in Fig. F3 for 
illustration is an example of a 7-tap vertical filter. 

The operation of interface-to-progressive inter- 
polator C5340 is shown graphically in Fig. F4. Con- 25 
struction of the first frame of each progressive pair is 
shown at the top. Lines A,C,G,... exist in interlaced 
field 1, and lines B, D,F.... exist in interlaced field 2. In- 
terpolation to obtain missing line D of the progressive 
frame is shown as D = (OE+2aD-aB-aF)/2, where 30 
typically 0<a<1. For the second frame of each pro- 
gressive pair a similar interpolation produces missing 
line C. 

The low-pass spatial filter C5a110 is needed to 
alleviate line flicker when displaying the interlaced 35 
frames on an interlaced display. The effect of the low- 
pass spatial filter is to significantly reduce the reso- 
lution by blurring, the visual information in the pro- 
gressive frames. While this blurring is necessary to 
reduce line flicker in moving areas of the picture, it is 40 
unnecessary in stationary areas. Since text and com- 
puter graphics are often displayed without motion, 
they are particularly penalized by the low-pass spa- 
tial filter. 

A solution to this dilemma is to use an adaptive 45 
spatio-temporal low-pass filter that blurs only the 
moving parts of the picture and leaves the stationary 
parts at full resolution. One simple example is a 
three-tap finite-impulse-response (FIR) temporal fil- 
ter. Such a filter Is shown In Fig. C5b, where two pro- so 
gressive frame delays are used to form a weighted 
average of three progressive frames prior to TV line 
subsampling. Weighted averager C5b130 in a non- 
adaptive arrangement may apply a weight W, where 
0<W<1, to the signal on line C5b160 corresponding to ss 
the middle frame, and a weight (1-W)/2 to each of the 
signals on lines C5M50 and C5b170 corresponding 
to the remaining two frames. This weighting is fol- 



lowed by a summation of the three weighted signals 
to form the filtered progressive video signal output on 
bus C5b140. 

If the motion is moderate to rapid, blurring may be 
introduced by such a nonadaptive temporal filtering. 
Fig. C5c shows a motion adaptive filter that estimates 
the speed of motion and adjusts the weight W accord- 
ingly. Module C5c190 produces motion estimation 
signals. In many implementations, these signals may 
already be available as a result of the video coding 
process. The resulting motion vector MV is output on 
bus C5c210 and fed to a lookup table C5c200, which 
produces the weighting value W according to the 
amount of local motion. Weighting value W is output 
on line C5c180 and fed to the weighted averager 
C5c130, where it is used as described above for the 
apparatus of Fig. C5b. 

Fig. C6 shows another example of the invention 
where the base layer consists of an Interlaced signal 
atf ull resolution, but half the frame rate of the original. 
Here, a progressive-to-two-interlace-converter 
c6110 converts each pair of progressive frames on 
bus c6100 into two interlaced frames and outputs the 
results on buses c6120 and c6130. 

Fig. C6a shows an example of a progressive to 
two interlace converter. The operation is exactly the 
same as in Fig. C5a except that instead of discarding 
every other TV line, the alternate line switch C6a120 
feeds each TV line to alternate outputs. Thus, for the 
first frame of a progressive frame pair, the odd num- 
bered TV lines are fed to output bus C6a170 (in- 
terim), and the even numbered TV lines are fed to 
C6a180 (intert_2). For the second frame of a pro- 
gressive frame pair, the even numbered TV fines are 
fed to output bus C6a170 (Inter1_1), and the odd 
numbered TV lines to bus C6a180 (interl_2). The op- 
eration of the progressive to two interlace converter 
is shown graphically in Fig. F5. 

Base encoder c6140 operates in exactly the 
same way as the base encoder C3140 in Fig. C3, ex- 
cept that it codes full resolution, interlaced video at 
half the frame rate. A replica decoded base layer vid- 
eo is output on bus c6150 and passed to enhance- 
ment encoder c6160 for use in coding the interlaced 
video input on bus C6120. 

Enhancement encoder c4160 operates in exactly 
the same way as the enhancement encoder C3180 in 
Fig. C3. However, in this case the prediction inter- 
laced picture on bus c61 50 is temporally shifted from 
the video on bus c4120 that is to be encoded. For this 
reason a simple coding of the difference is not the 
most efficient method. An example of encoding is de- 
scribed below. 

The remaining operations of encoding, multiplex- 
ing, demultiplexing and base decoding are identical to 
those operations in Fig. C3. In the absence of trans- 
mission errors the decoded base layer video on buses 
C6320, C6330 and c6340 is identical to the replica de- 



11 



EP 0 634 871 A2 



12 



coded video on bus c61 50. 

Enhancement decoder c6310 produces a full re- 
solution, interlaced, half frame-rate video on bus 
c6360. The fields of this interlaced video are tempor- 
ally displaced from the field times of the base layer 
interlaced video on bus c6320. An example of such 
decoding is described below. 

Thus, the decoding process produces an inter- 
laced 30 Hz TV signal on each of the outputs C6340 
and C6360. The two interlaced signals are then com- 
bined to produce a progressive 60 Hz TV signal by 
two interlace to progressive Converter c6360 in Fig. 
C6. 

Two-interlace to progressive Converter c€350 
basically combines one field on bus c6360 with the 
temporally corresponding field on bus c6340 to form 
a progressive frame that is output on bus c6370. Its 
detailed operation is shown in Fig. C6b, which is ba- 
sically the reverse of the operation of the apparatus 
of Fig. C6a. However, here there is no need for a filter. 
The processing is also shown graphically in Fig. F6. 

The system of Fig. C6b produces a low-pass fil- 
tered progressive output If only a fixed spatial low- 
pass filter were used, the quality of the progressive 
output might not be acceptable due to the overall blur- 
riness. The quality may be markedly improved by us- 
ing the aforementioned motion adaptive filters in 
module c6a110. 

Figure e12a shows an example of a base encoder 
and an enhancement encoder corresponding to the 
ones shown in of Fig. c3. High resolution video enters 
on bus e1 2a 100. Spatial decimator e12a1 10 reduces 
the number of pels per frame, as described above, 
and outputs the base layer video on bus e12a120 to 
a base encoder. 

The base encoder may be a motion picture ex- 
perts group (MPEG) arrangement, which for general- 
ity is shown as coding MPEG I, B and P pictures ac- 
cording to the structure graphically shown in Fig. 
G12A. A frame reorganizer block ORG e12a130 reor- 
ders the input frames in preparation for coding and 
outputs the result on buses e12a140 and e12a150. 
An example of a frame reorganizer block is shown in 
Fig. ED. 

Amotion estimator e12a170 examines the input 
frame on bus e12a150 and compares it with one or 
two previously coded frames. If the input frame is 
type I or P then one previous frame is used. If it is type 
B then two previously coded frames are used. 

Motion estimator e12a170 outputs motion vec- 
tors on bus e12a175 for use by motion compensator 
e12a1 80 and on bus e12a305 for use by a variable en- 
coder e12a310. Motion compensator e12a180 utiliz- 
es the motion vectors and pels from previously coded 
frames to compute (for P and B type frames) a motion 
compensated prediction that is output on bus 
e12a230 which is passed to busses e12a240 and 
e12a250. For I type frames, the motion compensator 



e 12a 180 outputs zero pel values. 

Subtracter e12a160 computes the difference be- 
tween the input frame on bus e12a140 and (for P and 
B types) the prediction frame on bus e12a250. The 

5 result appears on bus e12a260, is transformed by 
transformer e12a270 and quantized by quantizer 
e12a290 into typically integer values. Quantized 
transform coefficients pass on bus e12a300 to vari- 
able encoder e12a310 and inverse quantizer 

10 e12a380. 

Inverse quantizer e12a380 converts the quan- 
tized transform coefficients back to full range and 
passes the result via bus e12a390 to inverse trans- 
form e12a400, which outputs pel prediction error val- 

15 ues on bus e1 2a41 0. Adder e1 2a420 adds the predic- 
tion error values on bus e12a410 to the prediction val- 
ues on bus e1 2a240 to form the coded base layer pels 
on buses e12a430 and e12a440. 

For I and P type frames, switch e12a435 passes 

20 the coded pels input on bus e1 2a430 to a nextpicture 
store e12a200via a bus e12a205. Simultaneously, 
the frame that was in nextpicture store e12a20 pass- 
es via bus e12a1 95 to previouspicture store e12a190. 
For B type frames, switch e12a435 takes no action, 

25 and the contents of picture stores e12a190 and 
e12a200 remain unchanged. 

The contents of picture stores e12a190 and 
e12a200 pass to motion estimator e12a1 70 and mo- 
tion compensator e12a180 via buses e12a210 and 

30 e12a220 for use as needed. 

The quantizer step size qs that is used by quan- 
tizer e12a290 and inverse quantizer e 12a 380 is com- 
puted adaptively by quantization adapter e12a360 
depending on the aforementioned buffer fullness in- 

35 dicatlon on bus e12a350. The step size passes via 
bus e1 2a370 to quantizer e1 2a290 and inverse quan- 
tizer e12a380. The qs variable also passes to variable 
encoder e12a310 via bus e12a375. 

Variable encoder e12a310 encodes quantized 

40 transform coefficients Input on bus e1 2a300, motion 
vectors input on bus e12a305 and quantizer step si- 
zes qs input on bus e12a375 into a typically variable 
bit-rate bit-stream that is output on bus e12a320. 
This bit-stream on bus e 12a 320 then passes to a 

45 buffer e12a330 for temporary storage until it passes 
via bus e12a340 to the Systems Multiplexer. Also as 
described above, the fullness of buffer e1 2a330 is di- 
rected to the base encoder and the enhancement en- 
coder of Fig. E12A via bus e12a350. 

so The coded base layer frames pass via bus 

e12a440 to interpolator e12a450, as described 
above, where they are upsampled and passed to the 
enhancement encoder via bus e12a460. 

ORG e12a470 reorders the high resolution video 

55 frames to match the order of the base layer and out- 
puts the result on bus e12a480. Subtracter e12a490 
computes the difference between the input picture on 
bus e12a480 that is to be coded and a spatial predic- 
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tion picture on bus e12a460. The prediction error is 
output on bus e12a500, transformed by transformer 
e12a510, quantized by quantizer e12a530 and 
passed via bus e12a540 to variable encoder 
e12a550. The quantizer step size used by the en- s 
hancement encoder is computed by quantization 
adapter e12a600 depending on the aforementioned 
two bufferf ullnesses received on buses e12a350 and 
e12a590. The step size passes via bus e12a610 to 
quantizer e12a600 and to variable encoder e12a550 10 
via bus e12a615. 

Variable encoder e12a550 encodes quantized 
transform coefficients input on bus e12a540 and 
quantizer step sizes qs input on bus e12a615 into a 
typically variable bit-rate bit-stream that is output on is 
bus e12a560. 

This bit-stream on bus e 12a 560 then passes to 
buffer e12a570 for temporary storage until it passes 
via bus e12a580 to the Systems Multiplexer. As de- 
scribed above, the fullness of buffer e 1 2a570 passes 20 
to the Enhancement Encoder via bus e12a590. 

Figure e12b shows an example of a base encoder 
and an enhancement encoder corresponding to those 
items shown generally in Fig. c4. Both the base and 
enhancement layers employ progressive signal at full 25 
resolution, but half the frame rate of the original. Al- 
ternately, coding can also be performed according to 
Fig. c6. The picture structure for this encoder is 
shown in Fig. g12b. For the purpose of explanation of 
encoding operations, assume coding according to 30 
Fig. c4. 

High resolution video enters on bus e12b100. A 
temporal demultiplexer e12b110 may be a simple 
switching mechanism which routes alternate frames 
of progressive Input to output buses e12b115 and 35 
e 12b 120, respectively. 

The base encoder in Fig. E12B operates in exact- 
ly the same way as the base encoder in Fig. e12a, ex- 
cept that it codes full resolution video at half the frame 
rate. A replica decoded base layer video is output on 40 
bus e12b440 that is full resolution. Thus, there is no 
need for upsampling prior to delivery to the enhance- 
ment encoder. 

The enhancement encoder in Fig. E12B is similar 
to that of Fig. e12a. However, in this case, the predic- 45 
tion picture on bus e12b440 is temporally shifted 
from the video frames on bus e12b115 that are to be 
encoded. For this reason, a simple coding of the dif- 
ference may not be the most efficient method. 

ORG e 125470 reorders the high resolution video so 
frames to match the order of the base layer and out- 
puts the result on buses e12b480 and e12b485. 

The base layer prediction picture on bus e12b440 
first enters a transition store e12b620 whose con- 
tents are made available on bus e12b630 to motion 55 
estimator e12b640 and motion compensator 
e12b655. 

Motion estimator e12b640 examines the input 



frame on bus e12b485 and compares it with the base 
layer prediction frame on bus e12b630. Motion esti- 
mator e12b640 outputs motion vectors on bus 
e12b650 for use by motion compensator e12b655 
and on bus e12b670 for use by variable encoder 
e12b550. Motion compensator e12b655 utilizes the 
motion vectors and pels from the base layer predic- 
tion frame to compute a motion compensated predic- 
tion that is output on bus e12b460 and passes to Sub- 
tractor e12b490. 

The remaining operations of the Encoding oper- 
ation of Fig. F12B are identical to those of Fig. e12a, 
except that variable encoder e1 2b550 inserts the mo^ 
tion vectors on bus e12b670 into the output bit-stre- 
am. 

Figure e12d shows an example of a base encoder 
and enhancement encoder corresponding to those 
items in the system of Fig. c4. Both the base and en- 
hancement layers employ progressive signal at full 
resolution, but half the frame rate of the original. Al- 
ternately, coding can also be performed according to 
Fig. c6. The picture structure for this encoder is 
shown in Fig. g12d. For the purpose of explanation of 
encoding operations, assume coding according to 
Fig. c4. 

High resolution video enters on bus e12d100. 
Temporal demultiplexer e12d110 may be a simple 
switching mechanism that routes alternate frames of 
progressive input to output buses e12d115 and 
e12d120, respectively. 

The base encoder in Fig. E12D operates in the 
same way as the base encoder in Fig. e12b, except 
for the replica base layer video, which is described be- 
low. 

The enhancement encoder of Fig. E12D has sev- 
eral differences compared with that of Fig. e12b. 
First, the enhancement layer video frames are not re- 
ordered prior to encoding. This means that the decod- 
ed base layer frames that are to be used as predic- 
tions are not in the correct order and must be reor- 
dered back to the original camera order. Second, the 
prediction is computed as a weighted average of two 
predictions, as described below. 

An ORG module, as shown in Fig. ED, could be 
used to reorder the replica decoded base layer video 
to match that of the enhancement layer. However, a 
much simpler solution is provided by switch e12d810. 
Afterthe encoding of a B type frame in the base layer, 
switch e12d810 is in the "B w position which routes the 
B frame from the output of adder e12d420 via bus 
e12d440 to its output on bus e12d815. During the en- 
coding of I and P type frames, switch e12d810 is in 
the "A" position and routes previously coded frames 
from bus e12d210 via bus e12d800 so that they 
match temporally with the frames being encoded In 
the enhancement layer. 

As mentioned above, the enhancement layer vid- 
eo on buse12d115 is not reordered prior to encoding. 
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Thus, delay e12d470 delays the enhancement layer 
video on bus e 1 2d 11 5 In order to temporally match the 
replica decoded base layer video on bus e12d815. 
The delayed enhancement layer video passes to a 
subtracter e12d490 and a motion estimator e12d640 
via buses e12d480 and e12d485, respectively. 

The base layer prediction picture on bus e12d815 
enters a transition store e1 2d620 whose contents are 
made available on bus e1 2d630 to the motion estima- 
tor e12d640 and a motion compensator e12d655. 

Motion estimator e12d640 examines the en- 
hancement layer input frame on bus e12d485 and 
compares it with the base layer prediction frame on 
bus e12d630. Motion estimator e12d640 outputs mo- 
tion vectors on bus e12d650 for use by the motion 
compensator e1 2d655 and on bus e 1 2d670 for use by 
a variable encoder e12d550. Motion compensator 
e12d655 utilizes the motion vectors and pels from the 
base layer prediction frame to compute a motion com- 
pensated prediction that is output on bus e12d690 
and passes to weighter e12d710 on bus e12d690. 

Motion estimator e12d640 also examines the en- 
hancement layer input frame on bus e12d485 and 
compares it with the previously coded enhancement 
layer frame on bus e12d680 to compute additional 
motion vectors. Motion estimator e12d640 outputs 
these additional motion vectors also on bus e12d650 
for use by motion compensator e12d655 and on bus 
e12d670 for use by variable encoder e12d550. Mo- 
tion compensator e12d655 utilizes these motion vec- 
tors and pels from the enhancement layer prediction 
frame on bus e12d680 to compute another motion 
compensated prediction that passes to the weighter 
e12d710 on bus e12d700. 

Weighter e12d710 computes a weighted average 
of the two predictions input on buses e12d690 and 
e12d700 and outputs the result on buses e12d720 
and e12d730 to subtracter e12d490 and adder 
e12d780, respectively. The weighting may be fixed, 
or it may adapt to such factors as the amount on mo- 
tion in the scene, scene changes, etc. The weights 
could be limited to a finite set to minimize transmis- 
sion overhead. Or the weights could be limited to 0 
and 1 , in which case the Weighter becomes a simple 
switch that passes either the Input from bus e12d690 
or the input from bus e12d700. 

The remaining operations of the Enhancement 
Encoding are identical to those of the Base Layer, ex- 
cept for the Quantization Adaptation e12d600 which 
operates In exactly the same way as in Figs. e12a and 
e12b. 

Specifically, the prediction error is calculated by 
subtracter e12d490 t transformed by transformer 
e12d510, quantized by quantizer e1 2d 530, encoded 
along with the quantizer step size qs and motion vec- 
tor mv by variable encoder e12d550, sent to buffer 
e12d570, and then sent to the systems multiplexer. 

The decoded enhancement layer video, which is 



needed for motion compensation of the next en- 
hancement layer frame, is calculated in the same way 
as in the base layer, except that there are no B-type 
frames. Specifically, the quantized transform coeff k 

5 cients are converted to full range by inverse quantizer 
e12d740, converted to prediction error pet values by 
inverse transform e12d760, added to the motion com- 
pensated prediction by adder e1 2d 780, and passed to 
the previous frame store e12d660 for use in motion 

10 estimation of the next frame. 

Figure e12e shows an example of a base encoder 
and an enhancement encoder corresponding to those 
in Fig. c4. Both the base and enhancement layers em- 
ploy progressive signal at full resolution, but at half 

15 the frame rate of the original. Alternately, coding can 
also be performed according to Fig. c6. The picture 
structure for this encoder is shown in Fig. g12e. For 
the purpose of illustrative explanation of encoding op- 
erations, assume coding according to Fig. c4. 

20 High resolution and frame rate video enters on 

buse12e100. Inthis example, the decimatore12e110 
is temporal demultiplexer, a simple switching mecha- 
nism that routes alternate frames of progressive input 
video to output buses e12e115 and e12e120, respec- 

25 tively. The interpolator e12e450 is a 1:1 upsampler 
(alternately, there is no need for upsampling In some 
examples). 

The Base Encoder operates in exactly the same 
way as in Fig. e12b. A replica decoded base layer vid- 
30 eo is output on bus e 1 2e440 that is full resolution. The 
Enhancement Encoder is similar to that of Fig. e12b. 
It, however, uses bidirectional prediction from base 
layer. 

Delay e12e470 delays the high resolution video 

35 frames of the enhancement layer. 

The base layer prediction picture on bus 
e12e440, as well as contents of picture stores 
e12e210 and e12e220 on respective buses e12e800 
and e12e805, are available at switch e12e810. De- 

40 pending on the picture being coded in the enhance- 
ment encoder, specific two out of three available 
base layer prediction frames at the input of switch 
e12e810 are necessary in the enhancement layer 
and pass via a 1:1 interpolator e12e450 to switch 

45 e12e605 and enter transition stores e12e620 and 
e12e625 both of whose contents are made available 
on bus e12e630 and bus e1 2e635 to motion estimator 
e12e640 and motion compensator e12e655. 

To be more clear, we refer to the pair of B frames 

50 between every pair of reference frames In the base 
layer of Fig g12e as B1 and B2 frames, and frames 
of enhancement layer as the first I 2 frame, the second 
I 2 frame, the third I 2 frame and so on. During the en- 
coding of I and P type frames of the base layer, switch 

55 e12e810 Is in the "C" position and routes previously 
coded frames from bus e12e210 via bus e12e&00 
and through the switch e12e815 to bus e12e820 and 
further through switch e12e605 which is in "B" posi- 
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tion to frame store e12e620. After the encoding of a 
B1 frame, switch e12e810 is in the "A" position and 
routes the B1 frame from the output of the adder 
e12e420 via bus e12e440 to its output on bus 
e12e815 and through the switch e12e605 which is in s 
"A" position to the frame store e12e625. At this time, 
encoding of the first I 2 frame is accomplished. After 
the encoding of a B2 frame, switch e12e810 is in the 
"A" position and routes the B2 frame from the output 
of addere12e420 via buse12e440to its output on bus 10 
e12e815 and through the switch e12e605 which is in 
"B" position to frame store e1 2e620. Now the encod- 
ing of second I 2 frame is accomplished. At this time, 
to accomplish encoding of the third I 2 frame, contents 
of picture store e12e200 are routed on bus e12e800 is 
and through switch e12e810 which is in "B" position 
to switch e12e605 which is in "A" position and to 
frame store e12e625. This process repeats itself for 
coding of subsequent frames. 

Motion estimator e12e640 examines the input 20 
frame on bus e12e485 and compares it with the base 
layer predictions on bus e12e630 and on bus 
e12e635. Motion estimator e12e640 outputs motion 
vectors on bus e12e650 for use by motion compen- 
sator e12e655. Motion vectors are also made avail- 2s 
able for use by a variable encoder e12e550. Motion 
compensatore12e655 utilizes the motion vectors and 
pels from the two base layer prediction frames to 
compute a motion compensated prediction that is out- 
put on bus e12e460 and passes to subtracter 30 
e12e490. 

The remaining operations of the encoding in Fig. 
e12e are identical to those of Fig. e12b. 

Figure e12g shows an example of a base encoder 
and an enhancement encoder corresponding to those 
in Fig.c4. Both the base and enhancement layers em- 
ploy progressive signal at full resolution, but at half 40 
the frame rate of the original. Alternately, coding can 
also be performed according to Fig. c6. The picture 
structure for this encoder is shown in Fig. g12g. For 
the purpose of illustrative explanation of encoding op- 
erations, assume coding according to Fig. c4. 45 

High resolution and frame rate video enters on 
bus e12g100. In this example, the decimator is 
e12g110 is a temporal demultiplexer, a simple switch- 
ing mechanism that routes alternate frames of pro- 
gressive input video to output buses e12g115 and so 
e12g120, respectively. The interpolator, e12g450 is a 
1:1 upsampler. 

The base encoder in Fig. e12g operates in the 
same way the base encoder in Fig. e12e operates. 
The enhancement encoder has a notable difference 55 
compared with the enhancement encoder of Fig. 
e12e since it allows not only base layer frames as pre- 
dictions but also temporal prediction from the en- 

10 



hancement layer. 

The base layer prediction picture on bus 
e12g440, as well as contents of picture stores 
e12g210 and e12g220 on respective buses e12g800 
and e12g805, are available at switch e12g810. De- 
pending on the picture being coded in the enhance- 
ment encoder, specific two out of three available 
base layer prediction frames at the input of switch 
e12g810 are necessary in the enhancement layer 
and pass via a 1:1 interpolator e12g450 to switch 
e12g605 and enter transition stores e12g620 and 
e12g625 both of whose contents are made available 
on bus e12g630 and bus e12g635 to motion estimator 
e12g640 and motion compensator e12g655. 

In a manner similar to Fig. e12e, we refer to the 
pair of B frames between every pair of reference 
frames in the base layer of Fig g12g as B1 and B2 
frames, and frames of the enhancement layer as the 
first I 2 frame, the first P 2 frame, the second P 2 frame, 
and so on. During the encoding of I and P type frames 
of the base layer, switch e1 2g81 0 is in the 9 C m position 
and routes previously coded frames from bus 
e12g210 via bus e12g800 and through the switch 
e12g815 to bus e12g820 and further through switch 
e12g605, which is in "B", position to frame store 
e12g620. After the encoding of a B1 frame, switch 
e12g810 is in the "A" position and routes the B1 frame 
from the output of adder e12g420 via bus e12g440 to 
its output on bus e12g815 and through the switch 
e12g605 which is in the "A" position to frame store 
e12g625. At this time, encoding of the first I 2 frame 
is accomplished, coded frame appears via bus 
e12g790 and is stored in frame store e12g660 to be 
used for prediction of the first P 2 frame. After the en- 
coding of a B2 frame, switch e12g810 is In the "A" 
position and routes the B2 frame from the output of 
adder e12e420 via bus e12g440 to its output on bus 
e12g815 and through the switch e12g605, which is in 
"B" position, to frame store e12g620. Now the encod- 
ing of first P 2 frame is accomplished, the coded frame 
appears via bus e12g790 and is stored in frame store 
e12g660 to be used for prediction of the second P 2 
frame. At this time, to accomplish encoding of the 
second P 2 frame, the contents of picture store 
e12g200 are routed on bus e12g800 and through 
switch e12g810 which is in the "B" position to switch 
e12g605, which is in the " A" position, and to frame 
store e12g625. The coded frame appears via bus 
e12g790 and is stored in frame store e12g660 to be 
used for prediction the next P 2 frame. This process re- 
peats itself for coding of subsequent frames. 

Motion estimator e12g640 examines the en- 
hancement layer input frame on bus e12g485 and 
compares it with the base layer prediction frames on 
bus e12g630 and bus e12g635. Motion estimator 
e12g640 outputs motion vectors on bus e12g650 for 
use by motion compensator e1 2g655 and by variable 
encoder e12g550. Motion compensator e12g655 util- 
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izes the motion vectors and pels from the base layer 
prediction frames to compute a motion compensated 
prediction that is output on bus e12g690 and passes 
to weighter e12g710 on bus e12g690. 

Motion estimator e1 2g640 also examines the en- 5 
hancement layer input frame on bus e12g485 and 
compares it with the previously coded enhancement 
layer frame on bus e12g680 to compute additional 
motion vectors. Motion estimator e12g640 outputs 
these additional motion vectors also on bus e1 2g650 10 
for use by motion compensator e12g655 and for use 
by variable encoder e12g550. Motion xompensator 
e12g655 utilizes these motion vectors and pels from 
the enhancement layer prediction frame on bus 
e12g680 to compute another motion compensated 15 
prediction that passes to weighter e12g710 on bus 
e12g700. 

Weighter e12g710 computes a weighted average 
of the two predictions input on buses e12g690 and 
e12g700 and outputs the result on buses e12g720 20 
and e12g730 to subtracter e12g490 and adder 
e12g780, respectively. The weighting may be fixed, 
or it may adapt to such factors as the amount on mo- 
tion in the scene, scene changes, and the like. The 
weights could be limited to a finite set to minimize 25 
transmission overhead. Or the weights could be lim- 
ited to 0 and 1, in which case the weighter becomes 
a simple switch that passes either the input from bus 
e12g690 or the input from bus e12g700. 

The remaining operations of enhancement en- 30 
coding are identical to those of the Base Layer, except 
for the quantization adaptation e12d600 which oper- 
ates in exactly the same way as in Figs. e12a and Fig. 
e12b. 

Specifically, the prediction error is calculated by 35 
subtracter e12g490, transformed by transformer 
e12g510, quantized by quantizer e12g 530, encoded 
along with the quantizer step size qs and motion vec- 
tor mv by variable encoder e12g550, sent to buffer 
e12g570 and thence to the systems multiplexer 40 

The decoded enhancement layer video, which is 
needed for motion compensation of the next en- 
hancement layer frame, is calculated in the same way 
as in the base layer, except that there are no B-type 
frames. Specifically, the quantized transform coeff I- 45 
cients are converted to full range by inverse quantizer 
e12g740, converted to prediction error pel values by 
inverse transform e1 2g760, added to the motion com- 
pensated prediction by adder e12g780 and passed to 
the previous frame store e12g660 for use in motion so 
estimation of the next frame. 

Figure e12h shows an example of a base encoder 
and an enhancement encoder corresponding to those 
of Fig. c4. Both the base and enhancement layers em- 
ploy a progressive signal at full resolution, but at half 55 
the frame rate of the original. Alternately, coding can 
also be performed according to Fig. c6. The picture 
structure for this encoder is shown in Fig. g12h. For 



the purpose of illustrative explanation of encoding op- 
erations, assume coding according to Fig. c4. 

High resolution and high frame rate video enters 
on bus e12h100. In this example, decimator e12h110 
is a temporal demultiplexer, a simple switching mech- 
anism that routes alternate frames of progressive In- 
put video to output buses e12h115 and e12h120, re- 
spectively. The interpolator e12h450 is a 1:1 upsam- 
pler (alternately, there may be no need for upsanrv 
pling). 

The base encoder in Fig. e12h operates in the 
same way that the base encoder of Fig. e12d oper- 
ates. 

The enhancement encoder has a very similar op- 
eration to that of the base encoder, except that it uses 
a weighted combination of motion compensated pre- 
diction from the enhancement layer with motion com- 
pensated prediction from the base layer. 

ORG e12h470 reorders the high resolution video 
frames to match the order of the base layer and out- 
puts the result on buses e12h480 and e12h485. 

Motion estimator e12h640 examines the en- 
hancement layer input frame on bus e12h485 and 
compares it with the base layer prediction frame on 
bus e12h630. Motion estimator e12h640 outputs mo- 
tion vectors on bus e12h650 for use by motion com- 
pensator e12h655 and by variable encoder e12h550. 
Motion compensator e 1 2h655 utilizes the motion vec- 
tors and pels from the base layer prediction frame to 
compute a motion compensated prediction that is out- 
put on bus e12h690 and passes to weighter e12h710 
on bus e12h690. 

Motion estimator e12h640 also examines the en- 
hancement layer input frame on bus e12h485 and 
compares it with the previously coded enhancement 
layer frame on bus e12h680 to compute additional 
motion vectors. Motion estimator e12h640 outputs 
these additional motion vectors also on bus e12h650 
for use by motion compensator e12h655 and for use 
by variable encoder e12h550. Motion compensator 
e12h655 utilizes these motion vectors and pels from 
the enhancement layer prediction frame on bus 
e12h680 to compute another motion compensated 
prediction that passes to weighter e12h7 10 on bus 
e12h700. 

Weighter e12h7 10 computes a weighted average 
of the two predictions input on buses e12h690 and 
e12h700 and outputs the result on buses e12h720 
and e12h730 to subtracter e12h490 and adder 
e12h780, respectively. The weighting may be fixed, 
or it may adapt to such factors as the amount on mo- 
tion in the scene, scene changes, and the like. The 
weights could be limited to a finite set to minimize 
transmission overhead. Or the weights could be lim- 
ited to 0 and 1, in which case the weighter becomes 
a simple switch that passes either the input from bus 
e12h690 or the input from bus e12h700. 

The remaining operations of the enhancement 
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encoding are identical to those of the base layer, ex- 
cept for the quantization adaptation e12h600 which 
operates in exactly the same way as it does in Figs. 
e12a and Fig. e12b. 

Specifically, the prediction error is calculated by s 
subtracter e12h490, transformed by transformer 
e12h510, quantized by quantizer e12h530, encoded 
along with the quantizer step size qs and motion vec- 
tor mv by variable encoder e12h550, sent to buffer 
e12h570 and thence to the systems multiplexer. 10 

The decoded enhancement layer video, which is 
needed for motion compensation of the next en- 
hancement layer frame according to M=3 structure 
explained in description of Fig. g12h, is calculated in 
the same way as in the base layer. Specifically, the is 
quantized transform coefficients are converted to full 
range by inverse quantizer e12h740, converted to 
prediction error pel values by inverse transform 
e12h760, added to the motion compensated predic- 
tion by adder e12h780 and if it is an I- or P- frame, it 20 
Is passed to the next frame store e12h665 after shift- 
ing the contents of next frame store to previous frame 
store e1 2h660, for use in motion estimation of the fol- 
lowing frame in coding order according to M=3 struc- 
ture. 25 

Figure e12i shows an example of a base encoder 
and an enhancement encoder corresponding to those 
of Fig. c5. The base layer employs an interlaced sig- 
nal derived from the progressive original and the en- 
hancement layer employs progressive signal at full 30 
resolution. The picture structure for this encoder is 
shown in Fig. g12i. 

High resolution and high frame rate video enters 
on bus e12i100. The decimatore12i110 is a progres- 
sive to Interlace converter according to Fig. F3 and 35 
Fig. C5a as described earlier; its output is routed to 
bus e12h120 while the undeci mated progressive 
source is input as is to the enhancement encoder. The 
interpolator e12i450 performs the inverse operation 
of the decimator, le t interlace to progressive conver- 40 
sion; this operation is explained earlier according to 
Fig. F4. 

The operation of base encoder follows exactly 
the description of Fig. e12b, the only difference being 
thatthe Input to base encoder is Interlaced rather than 45 
the progressive input as in Fig. e12b. 

The enhancement encoder is quite similar to that 
of Fig. e12a, the only difference being that it operates 
at twice the frame rate of base encoder. 

ORG 6121470 reorders pairs of high resolution so 
video frames to match the order of each base layer 
frame and outputs the result on buses e12i480 and 
e12i485. It is important to notice that the base layer 
here processes interlaced frames that appear at half 
the frame rate of the enhancement layer. 55 

The output of the base encoder, comprising inter- 
laced frames, is available at bus e12i440 and passes 
through interlaced to progressive converter e12i450. 



The resulting signal on line e12i460 is applied to a 
switch e12i605 whose output e12i860 either passes 
directly to a next switch e12i880 or to a frame store 
e12i620 and on to bus e12i870 as the second input 
to switch e12i880. This is so because, after interlaced 
to progressive conversion, each interlaced frame re- 
sults in two progressive frames, only one of which can 
be output directly on line e12i890 while the other one 
is stored for prediction of a next enhancement layer 
picture by storing it in e12i620 until It is needed. 

Subtracter e12i490 computes the difference be- 
tween the input picture on bus e12i480 that is to be 
coded and the prediction picture on bus e12i890. The 
prediction error is outputon buse12i500, transformed 
by transformer e12i510, quantized by quantizer 
e12i530, and passed via bus e12i540 to variable en- 
coder e12i550. The quantizer step size used by the 
enhancement encoder is computed by quantization 
adapter e12i600 depending on the two buffer full- 
nesses on buses e12i350 and e12i590. The step size 
passes via bus e12i610 to quantizer e12i600 and to 
variable encoder e12i550 via bus e 12161 5. Variable 
encoder e12i550 encodes quantized transform coef- 
ficients input on bus e 121540 and quantizer step sizes 
qs input on bus e 121615 into a typically variable bit- 
rate bit-stream that is output on bus e12i560. As men- 
tioned earlier, the prediction picture for first of each 
pair of progressive frames on bus e12i480 comes di- 
rectly from bus e12i860, and for the second progres- 
sive frame from bus e12i870. This process is repeat- 
ed for subsequent frames. 

Figure e12j shows an example of a base encoder 
and an enhancement encoder corresponding to those 
of Fig. c5. The base layer employs an interlaced sig- 
nal derived from the progressive original and the en- 
hancement layer employs progressive signal at full 
resolution. The picture structure for this encoder is 
shown in Fig. g12j. 

High resolution and high frame rate video enters 
on bus e12J100. The decimator e12J110 Is a progres- 
sive to interlace converter according to Fig. F3 and 
Fig. C5a as described earlier; its output is routed to 
bus e12j120 while the undecimated progressive 
source is input as is to the enhancement encoder. The 
interpolator e1 2)450 performs the inverse operation 
of the decimator, ie, interlace to progressive conver- 
sion; this operation is explained earlier according to 
Fig. F4. 

The operation of base encoder follows exactly 
the description of Fig. e12d, the only difference being 
that the input to the base encoder is interlaced rather 
then the progressive input as in Fig. e12d. 

Delay e 12)470 delays the high resolution video 
frames at the input of the enhancement encoder. It is 
important to notice that the base layer here processes 
interlaced frames that appear at half the frame rate 
of enhancement layer. 

The enhancement encoder is an extension of en- 
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coder of Fig. e12a and uses not only the prediction 
from the base layer but also uses motion compensat- 
ed prediction from the enhancement layer as in Fig. 
e12d. Moreover, it operates at twice the frame rate of 
base encoder in Fig. e12i. 

The output of the base encoder, comprising inter- 
laced frames is available at bus e12j440 and passes 
through interlaced to progressive converter e12j450. 
The resulting signal on line e12j460 is applied to a 
switch e12j605 whose output e12j860 either passes 
directly to a next switch e12j880 or to frame store 
e12j620 and on to buse12j870 as the second input to 
switch e12j880. This is so because, after interlaced to 
progressive conversion, each interlaced frame re- 
sults in two progressive frames, only one of which can 
be output directly on line e12j890 while the other one 
is stored for prediction of a next enhancement layer 
picture by storing it in e12j620 until it is needed. 

Reordering of frames of the base layer is accom- 
plished via a switch e12j810. After the encoding of a 
B type frame in the base layer, switch e12j810 is in 
the "B" position and routes the B- frame from the out- 
put of adder e12j420 via bus e12j440 to its output on 
bus e12j815. During the encoding of I and P type 
frames in the base layer, switch e12j810 is in the "A" 
position and routes previously coded frames from bus 
e12j210 via bus e12j800 so that they match tempor- 
ally with the frame being encoded in the enhance- 
ment layer. 

As mentioned earlier, the enhancement layer vid- 
eo on bus e12j100 is not reordered prior to encoding. 
Thus, delay e12d470 delays the enhancement layer 
video on bus e1 2j1 00 in order to temporally match the 
replica decoded base layer video on bus e12j81 5. The 
delayed enhancement layer video passes to subtrac- 
ter e12j490 and motion estimater e12j640 via buses 
e12j480 and e12]485, respectively. 

The base layer prediction picture on bus e12j815 
enters interlace to progressive interpolator e12j450 
and then to bus e12j820 which is applied to a switch 
e12j605 whose output e12j660 either passes directly 
to a next switch e12i880 or to a frame store e12j620 
and on to bus e12j670 as the second input to a switch 
e 121880. 

Motion estimator e12J640 examines the en- 
hancement layer input frame on bus e12j485 and 
compares it with the previously coded enhancement 
layer frame on bus e12j680 to compute additional mo- 
tion vectors. Motion estimator e12j640 outputs these 
additional motion vectors also on bus e1 2J650 for use 
by motion compensator e12j655 and for use by vari- 
able encoder e12j550. Motion compensator e12j655 
utilizes these motion vectors and pels from the en- 
hancement layer prediction frame on bus e12j680 to 
compute motion compensated prediction that passes 
to weighter e12j710 on bus e12j700. 

Weighter e12j710 computes a weighted average 
of the two predictions input on buses e12j890 and 



e12j700 as described for the encoder of Fig. e12d. 

The remaining operations of the enhancement 
encoding are identical to those of the base layer, ex- 
cept for the quantization adaptation e12j600 which 
5 operates in exactly the same way as in Figs. e12a and 
e12b. 

Figure d12a shows an example of base decoder 
and an enhancement decoder corresponding to those 
of Fig. c3. 

10 The base decoder may be an MPEG decoder, 

which for generality is shown as decoding l t B and P 
pictures according to the structure of Fig. G12A. The 
received bit-stream on bus d 12a 340 passes from the 
systems demultiplexer to a buffer d12a330 for tem- 
15 porary storage until it passes via bus d1 2a320 to a va- 
riable decoder d1 2a31 0, 

Variable decoder d12a310 decodes quantized 
transform coefficients which are then output on bus 
d12a300, quantizer step sizes qs which are then out- 
20 put on bus d12a370, and motion vectors which are 
then output on buses d12a175 and d12a305. 

Motion compensator d 12a 180 utilizes the motion 
vectors on bus d12a175 and pels from previously de- 
coded frames on buses d12a2 10 and d12a220 to 
25 compute (for P and B type frames) a motion compen- 
sated prediction that is output on bus d12a240. For I 
type frames, motion compensator d12a180 outputs 
zero pel values. < 

Qs signals pass from variable decoder d12a310 
$0 via bus d12a370 to inverse quantizer d1 2a380. 

Quantized transform coefficients pass on bus 
d 12a 3 00 from variable decoder d12a310 to inverse 
quantizer d12a380. Inverse quantizer d12a380 con- 
verts the quantized transform coefficients back to full 
35 range and passes the result via bus d12a390 to In- 
verse transform d12a400, which outputs pel predic- 
tion error values on bus d12a410. Adder d12a420 
adds the prediction error values on bus d12a410 to 
the prediction values on bus d12a240 to form the de- 
40 coded base layer pels on buses d12a430, d12a435, 
and d12a440. 

For I and P type frames, switch d12a435 passes 
the decoded pels on bus d12a430 via bus d12a205 to 
the nextpicture store d12a200. Simultaneously, the 
45 frame that was in nextpicture store d12a200 passes 
via bus d12a195 to previouspicture store d 12a 190. 
For B type frames, switch d12a435 takes no action, 
and the contents of picture stores d12a190 and 
d12a200 remain unchanged. 
50 The contents of picture stores d12a190 and 

d12a200 pass to motion estimator d12a170 and mo- 
tion compensator d12a180 via buses d12a210 and 
d12a220 for use as needed by those elements. 

ORG d12a130 reorders the base layer decoded 
55 output frames on bus d1 2a435 in preparation for dis- 
play on busd12a120. 

The decoded base layer frames pass via bus 
d12a440 to interpolator d12a450, where they are up- 

13 



25 



EP 0 634 871 A2 



26 



sampled and passed to the enhancement decoder via 
bus d12a460. 

As described herein above, the enhancement 
layer bit-stream passes from the Systems Demulti- 
plexerto buffer d12a570 via bus d12a580fortempor- s 
ary storage until it passes via bus d12a560 to the Va- 
riable Decoder d12a550. 

Variable decoder d12a550 decodes quantized 
transform coefficients output on bus d12a540 and 
quantizer step sizes qs output on bus d12a610. Quan- 10 
tizer step sizes qs pass from bus d12a610 to inverse 
quantizer d12a530. 

Quantized transform coefficients pass on bus 
d 12a 540 from variable decoder d12a550 to inverse 
quantizer d12a530. Inverse quantizer d12a530 con- 15 
verts the quantized transform coefficients on bus 
d12a540 back to full range and passes the result via 
bus d12a520 to inverse transform d12a510 f which 
outputs pel prediction error values on bus d12a500. 
Adder d12a490 adds the prediction error values on 20 
bus d1 2a500 to the prediction values on bus d12a460 
to form the decoded enhancement layer pels on bus 
d12a480. 

ORG d12a470 reorders the high resolution video 
frames on bus d12a480 to match the order of the base 25 
layer and outputs the result on bus d12a100 for dis- 
play. 

High frame rate and high resolution progressive 
format video thus exits on bus d12a100. 

Fig. d12b shows a base decoder and an enhance- 30 
ment decoder corresponding to the encoder appara- 
tus of Fig. e12b. If coding is done following Fig. c4, 
both the base and enhancement layers use a progres- 
sive signal at full resolution, but at half the frame rate 
of the original. Alternately, if codec of Fig. c6 is em- 35 
ployed, both the base and enhancement layer use in- 
terlaced signals. The picture structure for this decod- 
er is shown in Fig. g12d. For the purpose of illustrative 
explanation of the operation of Fig. d12b, assume 
coding according to Fig. c4. 40 

The base decoder of Fig. D12B operates in exact- 
ly the same way as the base decoder in Fig. d12a, ex- 
cept that it decodes full resolution video at half the 
frame rate. A decoded base layer reordered video is 
output on buses d12b440, D12B140 an D12B430. 45 
The base layer video on bus 140 is reordered into 
camera order by ORG 130 and output on buses 120 
and 125. The base layer video is passed to the base 
layer display via bus 120. 

The enhancement decoder in Fig. D12B Is similar 50 
to that of Fig. d12a. However, in this case, the predic- 
tion picture on bus d12b440 is temporally shifted 
from the video frames on bus d12b115 that are to be 
decoded. 

The decoded base layer video is full resolution. ss 
Thus, there is no needforupsampling prior to delivery 
to the enhancement decoder. The base layer predic- 
tion picture on bus d12b440 first enters a transition 



store d1 2b620 whose contents are made available on 
bus d12b630 to motion compensator d12b655. 

The enhancement layer bit-stream passes from 
the systems demultiplexer to buffer d12b570 via bus 
d12b580 for temporary storage until it passes via bus 
d12b560 to the variable decoder d12b550. 

Variable decoder d12b550 decodes quantized 
transform coefficients which are output on bus 
d12b540, quantizer step sizes qs which are output on 
bus d12b610 and motion vectors which are output on 
buses d12b670 and d12b650. Quantizer step sizes qs 
pass from bus 610 to inverse quantizer 530. 

Motion compensator d12b655 utilizes the en- 
hancement layer motion vectors on bus d12b650 and 
pels from the base layer prediction frame on bus 
d12b630 to compute a motion compensated predic- 
tion that is output on bus d12b460 and passes to ad- 
der d12b490. 

Quantized transform coefficients pass on bus 
d12b540 from variable decoder d12b550 to inverse 
quantizer d12b530. Inverse quantizer d12b530 con- 
verts the quantized transform coefficients on bus 
540 back to full range and passes the result via bus 
d12b520 to inverse transform d12b510, which out- 
puts pel prediction error values on bus d12b500. Ad- 
der d12b490 adds the prediction error values on bus 
d12b500 to the prediction values on bus d12b460 to 
form the decoded enhancement layer pels on bus 
d12b480. 

ORG d12t>470 reorders the high resolution video 
frames on bus 480 to match the order of the base lay- 
er and outputs the result on bus d12b115. 

Temporal multiplexer d12b110 may be a simple 
switching mechanism that routes alternate frames of 
progressive Input on buses d12b115 and d12b125, re- 
spectively, to output bus d 12b 100. 

High resolution and high frame rate progressive 
format video thus exits on bus d12b100. 

Fig. d12d shows an example of a base decoder 
and an enhancement decoder corresponding to the 
encoder apparatus of Fig. e12d. If coding is done fol- 
lowing Fig. c4 t both the base and enhancement layers 
use a progressive signal at full resolution, but at half 
the frame rate of the original. Alternately, if codec of 
Fig. c6 is employed, both the base and enhancement 
layer use interlaced signal. The picture structure for 
this decoder is shown in Fig. g12d. For the purpose 
of illustrative explanation of operation of Fig. d12d, 
assume coding according to Fig. c4. 

The Base Decoder of Fig. D12D operates in the 
same way as in Fig. d12b ( except for the reordering 
of the decoded base layer video, as mentioned above 
for the encoder e12d. 

The enhancement decoder of Fig. D1 2B has sev- 
eral differences compared with that of Fig. d12b, as 
mentioned above for encoder e12d. Since the en- 
hancement layer video frames are not reordered prior 
to encoding, the decoded base layer frames that are 
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to be used as predictions must be reordered by switch 
d12d810. 

During the decoding of B type frames, switch 
d12d810 is in the "B" position and routes B-frames 
from the output of adder d12d420 via bus d12d440 to 
its output on bus d12d815. During the decoding of I 
and P type frames, switch d12d810 is in the "A" pos- 
ition and routes previously coded frames from bus 
d12d210 via bus d12d800 so that they match tempor- 
ally with the frames being decoded in the enhance- 
ment layer. The base layer prediction picture on bus 
d12d815 enters transition store d12d620 whose con- 
tents are made available on bus d12d630 to motion 
compensator d12d655. 

The enhancement layer bit-stream passes from 
the systems demultiplexer to buffer d12d570 via bus 
d12d580 for temporary storage until it passes via bus 
d12d560 to the variable decoder d12d550. 

Variable decoder d12d550 decodes quantized 
transform coefficients and outputs them on bus 
d12d540, quantizer step sizes qs and outputs them 
on bus d1 2d61 0 and motion vectors and outputs them 
on buses d12d670 and d12d650. Quantizer step si- 
zes qs pass from bus d12d610 to inverse quantizer 
d12d740. Motion compensator d12d655 utilizes the 
enhancement layer motion vectors on bus d12d650 
and pels from the base layer prediction frame on bus 
d12d630 to compute a motion compensated predic- 
tion that is output on bus d12d690 and passed to 
weighter d12d710. 

Motion compensator d12d655 also utilizes the 
enhancement layer motion vectors on bus d12d650 
and pels from the previously decoded enhancement 
layer frame on bus d12d680 to compute a motion 
compensated prediction that is output on bus 
d12d700 and passed to weighter d12d710. 

Weighter d12d710 computes a weighted average 
of the two predictions input on buses d12d690 and 
d12d700 and outputs the result on buses d12d720 
and d12d730 to subtracter d12d490 and adder 
d12d780, respectively. The weighting used in com- 
puting the prediction is the same as was used during 
the encoding process. The remaining operations of 
the enhancement decoding are identical to those of 
the Base Layer. Specifically, the quantized transform 
coefficients on bus d12d540 are converted to full 
range by inverse quantizer d12d740, converted to 
prediction error pel values by inverse transform 
d12d760, added to the motion compensated predic- 
tion on bus d12d720 by adder d12d780, and output- 
ted on buses d12d790 and d12d115 as decoded en- 
hancement layer video. 

The video on bus d12d790 is passed to the pre- 
vious frame store d12d660 for use in motion compen- 
sation of the next frame. The video on bus d12d115 
is passed to the temporal multiplexer d12d110. Tem- 
poral demultiplexer d12d1 10 may be a simple switch- 
ing mechanism that routes alternate frames of pro- 



gressive input from input buses d12d115 and 
d12d120, respectively, to bus d12d100. High resolu- 
tion video thus exits on bus d12d100. 

Fig. d12e shows the base decoder and enhance- 

5 ment decoder corresponding to the encoder appara- 
tus of Fig. e12e. If coding is done following Fig. c4, 
both the base and enhancement layers use a pro- 
gressive signal at full resolution, but at half the frame 
rate of the original. Alternately, if the codec of Fig. c6 

10 is employed, both the base and enhancement layer 
use interlaced signal. The picture structure for this 
decoder is shown in Fig. g12e. For the purpose of il- 
lustrative explanation of Fig. d12e, assume coding 
according to Fig. c4. 

is The Base Decoder operates in the same way as 

in Fig. d12d, except for the reordering of the decoded 
base layer video. 

The operation of the enhancement decoder is 
similar to that of Fig. d12b. Since the enhancement 

20 layer video frames are not reordered prior to encod- 
ing, the decoded base layer frames that are to be 
used as predictions must be reordered by switch 
d12d810. 

The base layer prediction picture on bus 
25 d12e430, as well as contents of picture stores 
d12e190 and d12e200 on respective buses d12e800 
and d12e805 are available at switch d12e810. De- 
pending on the picture being decoded in the enhance- 
ment decoder, specific two out of three available 
30 base layer prediction frames at the input of switch 
d12e810 are necessary in the enhancement layer 
and pass via a 1:1 interpolator d12e450 to switch 
d12e605 and enter transition stores d12e620 and 
d1 2e625 both of whose contents are made available 
35 on bus d12e630 and bus d12e635 to motion compen- 
sator d12e655. 

We refer to a pair of B frames between every pair 
of reference frames in the base layer of Fig g12e as 
B1 and B2 frames, and frames of the enhancement 
40 layer as the first I 2 frame, the second I 2 frame, the 
third I 2 frame and so on. During the decoding of I and 
P type frames of the base layer, switch d12e810 is in 
the "A" position and routes previously coded frames 
from bus d12e210 through the switch d12e810 and 
45 bus d12e815 to bus d12e820 and further through 
switch d1 2e605 which is in "B* position to frame store 
d12e620. After the decoding of a B1 frame, switch 
d12e810 is in "B" position and routes the B1 frame 
from the output of adderd12e420 via bus d12e430 to 
so its output on bus d12e815 and through the switch 
d12e605 which is in tt A w position to frame store 
d12e625. At this time, decoding of the first I 2 frame 
is accomplished. After the decoding of a B2 frame, 
switch d12e810 is in n B n position and routes B2 frame 
55 from theoutputof adder d12e420 via bus d12e430 to 
its output on bus d12e815 and through the switch 
d12e605 which is in "B a position to frame store 
d12e620. Now the decoding of second I 2 frame is ao- 
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complished. At this time, to accomplish decoding of 
the third I 2 frame, contents of picture store d12e200 
are routed on bus d12e805 and through switch 
d12e810 which is in the "C" position to switch 
d12e605 which is in the " A" position and to frame s 
store d12e€25. This process repeats itself for the cod- 
ing of subsequent frames. 

The remaining operations of enhancement de- 
coding are identical to those of the base layer. Spe- 
cifically, the quantized transform coefficients on bus w 
d12e540 are converted to full range by inverse quan- 
tizer d12e740, converted to prediction error pel val- 
ues by inverse transform d12e760, added to the mo- 
tion compensated prediction on bus d12e720 by ad- 
der d12e780 and output on bus d12e115 as decoded is 
enhancement layer video. 

The video on bus d12e11 5 is passed to converter 
d12e105, a temporal multiplexer switching mecha- 
nism that routes alternate frames of progressive input 
video from buses d12e115 and d12e120, respective- 20 
ly. High resolution and high frame rate video exits on 
bus d12e100. 

Fig. d12g shows the base decoder and enhance- 
ment decoder corresponding to the encoder appara- 
tus of Fig. e12g. If coding is done following Fig. c4, 25 
both the base and enhancement layers use a progres- 
sive signal at full resolution, but at half the frame rate 
of the original. Alternately, if codec of Fig. c6 is em- 
ployed, both the base and enhancement layer use in- 
terlaced signals. 30 

The picture structure for this decoder is shown in 
Fig. g12g. For the purpose of illustrative explanation 
of Fig. d12g, assume coding according to Fig. c4. 

The base decoder operates in the same way as 
the base decoder In Fig. d12e. 35 

The enhancement decoder has a noticeable dif- 
ference compared with the enhancement decoder of 
Fig. d12e since it uses not only base layer frames as 
predictions but also temporal prediction from the en- 
hancement layer. 40 

The base layer prediction picture on bus 
d12g430, as well as contents of picture stores 
d12g210 and d12g220 on respective buses d12g800 
and d12g805 are available at switch d12g810. De- 
pending on the picture being decoded In the enhance- 45 
ment decoder, specific two out of three available 
base layer prediction frames at the input of switch 
d12g810 are necessary in the enhancement layer 
and pass via a 1:1 interpolator d12g450 to a switch 
d12g605 and enter transition stores d12g620 and so 
d12g625 both of whose contents are made available 
on bus d12g630 and bus d1 2g635 to motion estimator 
d12g640 and motion compensator d12g655. 

In a manner similar to Fig. d12e, we refer to a pair 
of B frames between every pair of reference frames ss 
in the base layer of Fig g12g as B1 and B2 frames, 
and frames of the enhancement layer as the first I 2 
frame, the first P 2 frame, the second P 2 frame, and 



so on. During the decoding of I and P type frames of 
the base layer, switch d12g810 is in the °A° position 
and routes previous decoded frame from bus 
d12g210 via bus d12g800 and through the switch 
d12g815 to bus d12g820 and further through switch 
d12g605 which is in "B" position to frame store 
d12g620. After the decoding of a B1 frame, switch 
d12g810 is in the "B" position and routes the B1 frame 
from the output of adder d12g420 via bus d12g440 to 
its output on bus d12g815 and through the switch 
d12g605 which is in the "A" position to frame store 
d12g625. At this time, decoding of the first I 2 frame 
is accomplished, the decoded frame appears via bus 
d12g790 and is stored in frame store d12g660 to be 
used for prediction the first P 2 frame. After the decod- 
ing of B2 frame, switch d12g810 is in the "B" position 
and routes B2 frame from the output of adder 
d12e420 via bus d12g440 to its output on bus 
d12g815 and through the switch d12g605 which is in 
the B B" position to a frame store d12g620. Now that 
the decoding of first P 2 frame is accomplished, the 
decoded frame appears via bus d12g790 and is stor- 
ed in frame store d12g660 to be used for prediction 
of the second P 2 frame. At this time, to accomplish de- 
coding of the second P 2 frame, the contents of picture 
store d12g200 are routed on bus d12g800 and 
through switch d12g810, which is in the "C* position, 
to switch d12g605, which is in the "A" position, and to 
frame store d12g625. The coded frame appears via 
bus d12g790 and is stored in frame store d12g660 to 
be used for prediction the next P 2 frame. This process 
repeats itself for coding of subsequent frames. 

Weighter d12g71 0 computes a weighted average 
of the two predictions input on buses d12g690 and 
d12g700 and outputs the result on buses d12g720 
and d12g730 to adder d12g780. 

The remaining operations of the enhancement 
decoding are identical to those of the base layer. Spe- 
cifically, the quantized transform coefficients on bus 
d12g540 are converted to full range by Inverse quan- 
tizer d12g740, converted to prediction error pel val- 
ues by inverse transform d12g760, added to the mo- 
tion compensated prediction on bus d12g720 by ad- 
der d1 2g780 and output on bus d12g480, after which, 
frames are reordered In d12g470 and output on 
d12g115 as decoded enhancement layer video. The 
video on bus d12e115 is passed to converter 
d12e1 05, a temporal multiplexer switch that routes al- 
ternate frames of progressive input video from buses 
d12e115 and d12e120, respectively. High resolution 
and high frame rate video exits on bus d12e100. 

Fig. d12h shows the base decoder and enhance- 
ment decoder corresponding to the encoder appara- 
tus of Fig. e12h. tf coding is done following Fig. c4, 
both the base and enhancement layers use a pro- 
gressive signal at full resolution, but at half the frame 
rate of the original. Alternately, if codec of Fig. c6 is 
employed, both the base and enhancement layer use 
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Interlaced signals. The picture structure for this de- 
coder is shown in Fig. g12h. For the purpose of illus- 
trative explanation of Fig. d12h, we assume coding 
according to Fig. c4. 

The base decoder operates in the same way as 
the base decoder in Fig. d12d. 

The enhancement decoder has a very similar op- 
eration to that of the base decoder, except that it uses 
a weighted combination of motion compensated pre- 
diction from the enhancement layer with motion com- 
pensated prediction from the base layer. 

Motion compensator d12h655 utilizes the decod- 
ed motion vectors, and pels from the base layer pre- 
diction frame, to compute a motion compensated pre- 
diction that is output on bus d12h690 and passes to 
weighter d12h710 on bus d12h690. Motion compen- 
sator d12h655 also utilizes decoded motion vectors, 
and pels from the enhancement layer prediction 
frame, on bus e12h680, to compute another motion 
compensated prediction that passes to weighter 
e12h710 on bus e12h700. 

Weighter d12h710 computes a weighted average 
of the two predictions input on buses d12h690 and 
d12h700 and outputs the result on buses d12h720 to 
adder d12h780. 

The remaining operations of enhancement de- 
coding are identical to those of the base layer. Spe- 
cifically, the quantized transform coefficients on bus 
d12h540 are converted to full range by inverse quan- 
tizer d12h740, converted to prediction error pel val- 
ues by inverse transform d12h760 f added to the mo- 
tion compensated prediction on bus d12h720 by ad- 
der d12g780 and output on bus d12h480, after which, 
frames are reordered in d12h470 and output on 
d12g115 as decoded enhancement layer video. The 
video on bus d12e115 is passed to converter 
d12e105, a temporal multiplexer switch that routes al- 
ternate frames of progressive input video from buses 
d12e115 and d12e120, respectively. High resolution 
and high frame video exits on bus d12e100. 

Fig. d12i shows the base decoder and enhance- 
ment decoder corresponding to the encoder appara- 
tus of Fig. e12i. Coding is done following Fig. c5. The 
base layer thus uses interlaced signals derived from 
the progressive original and the enhancement layer 
uses a full resolution progressive original. The picture 
structure for this decoder is shown in Fig. g12i. 

The operation of the base decoder in Fig. d12i fol- 
lows exactly the description of Fig. d12b, the only dif- 
ference being that the output of the base decoder In 
Fig. d12i is interlaced rather then the progressive vid- 
eo as in Fig. d 12b. 

The base layer prediction picture on bus d1 2i440 
enters interlace to progressive interpolator d12i450 
and then to bus d12!460 which Is applied to a switch 
d12i605 whose output d12i860 either passes directly 
to next switch d12i880 or to frame store d12i620 and 
on to bus d12i870 as the second input to switch 



d 121880. 

The operation of enhancement decoder is quite 
similar to that of Fig. d12a, one notable difference be- 
ing that it operates at twice the frame rate of the base 
5 decoder. 

High resolution and high frame rate video exits 
on busd12i100. 

Fig. d12j shows the base decoder and enhance- 
ment decoder corresponding to the encoder appara- 
10 tus of Fig. e1 2j. Coding is done following Fig. c5. The 
base layer thus uses interlaced signals derived from 
the progressive original and the enhancement layer 
uses a full resolution original. The picture structure 
for this decoder is shown in Fig. g12j. 
1 s The operation of th is base decoder follows exact- 

ly the description of the base decoder of Fig. d12d, 
the only difference being that the output of the base 
decoder is interlaced rather than the progressive out- 
put as in Fig. d12d. 
20 Reordering of frames of the base layer is accom- 

plished via switch d12j810. After the encoding of a B 
type frame in the base layer, switch d12]810 is in the 
"B" position and routes the B- frame from the output 
of adder e12j420 via bus e12J440 to its output on bus 
25 e12J815. During the encoding of I and P type frames 
in the base layer, switch e12j810 is in the "A" position 
and routes previously coded frames from bus 
e12j210 via bus e12J800 so that they match tempor- 
ally with the frame being encoded in the enhance- 
30 ment layer. 

The base layer prediction picture on bus d12j815 
enters interlace to progressive interpolator d12J450 
and then is directed to bus d12j820 which is applied 
to a switch d12j605 whose output d12j860 either 
35 passes directly to next switch d12J880 or to frame 
store d12j620 and then on to bus d12j870 as the sec- 
ond input to switch d12j880. 

Motion Compensator d12j655 utilizes decoded 
motion vectors and pels from the enhancement layer 
40 prediction frame on bus d 1 2J680 to compute a motion 
compensated prediction that passes to weighter 
d12j710 on bus d12j700. 

Weighter d12j710 computes a weighted average 
of the two predictions input on buses d12j890 and 
45 d12j700 as described for decoder of Fig. d12d. 

The operation of enhancement decoder is quite 
similar to that of Fig. d1 2d, one notable difference be- 
ing that it operates at twice the frame rate of the base 
decoder. 

so High resolution and high frame rate video exits 

on busd12j115. 

In MPEG terminology, three basic types of pic- 
tures, l-(lntra), P- (Unidirectional Predictive) and B- 
(Bidirectional Predictive) pictures are allowed. A plc- 

55 ture structure is an arrangement of these picture 
types, and is identified by value of 'M\ the distance 
between a P-picture and previous decoded reference 
picture, which is a P- picture except at the beginning 
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of a Group-of-Pictures (GOP) where it is an I- picture. 
The value of 'N\ the distance between corresponding 
I- pictures gives the length of a GOP. B- pictures are 
predicted noncausally and are outside the interf rame 
coding loop of P- pictures. B-pictures are predicted s 
using two references, an immediate previous decod- 
ed I- or P- picture, and an immediate next decoded P- 
or I- picture. The number of B- pictures between two 
reference pictures is given by value of 'M-1\ 

Fig. G12a shows picture structures for the base 10 
and the enhancement layers; the corresponding 
structures are applied to the base and the enhance- 
ment layer encoders. The base layer employs M=3 
structure with two B- pictures between the previous 
and the next reference pictures. The enhancement 15 
layer simply consists of I- pictures (no temporal pre- 
diction within the same layer) but uses spatially inter- 
polated base layer pictures for prediction. The base 
and the enhancement layer pictures occur at exactly 
the same temporal instants. 2 q 

The picture arrangement of Fig. G12a can be 
used, for example, for migration to progressive 960 
line, 60 Hz frame rate video. A general block diagram 
of a codec employing this picture arrangement is 
shown in Fig. C3. The base layer in that example uses 25 
progressive format 720 line, 60 Hz frame rate video, 
and the enhancement layer uses progressive format 
960 line, 60 H frame rate. Each base layer decoded 
picture is upsampled by a factor of 3:4 to yield predic- 
tion for each picture of the enhancement layer. The 30 
enhancement layer consists of I- pictures only with no 
temporal prediction of pictures within this layer. 

We extend the concept of I- pictures of MPEG to 
what we shall call P- pictures for the enhancement 
layer. The P-pictures are like I- pictures in the sense 35 
that they do not employ temporal prediction within the 
same layer. However, they do employ motion com- 
pensated unidirectional prediction from the base lay- 
er. 

Fig. G12b shows picture structures for the base 40 
and the enhancement layers; the corresponding 
structures are applied to the base and the enhance- 
ment layer encoders. The base layer employs M=3 
structure with two B- pictures between the previous 
and the next reference pictures. The enhancement 45 
layer simply consists of P- pictures (no temporal pre- 
diction within the same layer) but uses the immediate 
previous base layer picture as reference for motion- 
compensated prediction. 

The picture arrangement of Fig. G12b also can be so 
used for migration to progressive 960 line, 60 Hz vid- 
eo. A general block diagram of codecs employing this 
picture arrangement is shown in Fig. C4 and Fig. C6. 
In Fig. C4, the base layer uses progressive 960 line, 
30 Hz, obtained by selecting only the even numbered 55 
frames from 60 Hz source, and the enhancement lay- 
er uses progressive 960 line, 30 Hz, obtained by se- 
lecting the odd numbered frames; the enhancement 

18 



layer pictures thus occur at intermediate temporal in- 
stants of the base layer pictures. In Fig. C6 ( the base 
layer and enhancement layers both use interlaced 
960 line, obtained by either fixed or adaptive progres- 
sive-to-two interlace conversion from 60 Hz progres- 
sive source. The enhancement layer pictures occur at 
same temporal frame instants as the base layer pic- 
tures though their field order in each frame is comple- 
mentary to that in base layer For notational conve- 
nience, base layer pictures are labeled as even num- 
bered frames and enhancement layer pictures as odd 
numbered frames. Each base layer decoded picture 
is used for prediction of the next picture in the en- 
hancement layer. The enhancement layer consists of 
I 1 - pictures only with no temporal prediction of pic- 
tures within this layer. 

We can extend the concept of P- pictures of 
MPEG to what we call pi- pictures for the enhance- 
ment layer. The pi- pictures are like P- pictures in the 
sense that they employ unidirectional temporal pre- 
diction within the same layer. They, however, also em- 
ploy motion compensated unidirectional prediction 
from the base layer. 

Fig. G12d shows picture structures for the base 
and the enhancement layers; the corresponding 
structures are applied to the base and the enhance- 
ment layer encoders. The base layer in this example 
employs M=3 structure with two B-pictures between 
the previous and the next reference pictures. The en- 
hancement layer uses M=1, with first picture an P- 
picture and uses unidirectional motion compensated 
prediction with respect to previous base layer decod- 
ed picture. The remaining pictures are simply pi- pic- 
tures and use not only motion compensated predic- 
tion within the same layer but also motion-compen- 
sated prediction with immediate previous base layer 
picture as reference. 

As in the other picture arrangements described 
above, the picture arrangement of Fig. G12d can be 
used for migration to progressive 960 line, 60 Hz vid- 
eo. A general block diagram of codecs employing this 
picture arrangement is shown in Fig. C4 and Fig. C6. 
In Fig. C4 f the base layer uses progressive, 960 line, 
30 Hz, obtained by selecting only the even numbered 
frames from 60 Hz source. The enhancement layer 
uses progressive. 960 line, 30 Hz, obtained by select- 
ing the odd numbered frames. The enhancement lay- 
er pictures thus occur at intermediate temporal in- 
stants of the base layer pictures. In Fig. C6, the base 
layer and enhancement layers both use Interlaced 
960 line video, obtained by either f ixed or adaptive 
progressive-to-two interlace conversion from 60 Hz 
progressive source. The enhancement layer pictures 
occur at same temporal frame instants as the base 
layer pictures though their field order In each frame 
is complementary to that in base layer. For notational 
convenience, base layer pictures are labeled as even 
numbered frames and enhancement layer pictures as 
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odd numbered frames. Each base layer decoded pic- 
ture is used for prediction of the next picture in the en- 
hancement layer. The enhancement layer consists of 
M=1 with I 1 - and P 1 - pictures. The P 1 - pictures, ben- 
efit from two predictions, motion compensated pre- 
diction with respect to the immediately previous odd- 
numbered decoded picture in the same layer as well 
as motion compensated prediction from the imme- 
diately previous even-numbered decoded picture in 
the base layer. 

We now extend the concept of I 1 - pictures intro- 
duced above for enhancement layer to what we call 
I 2 - pictures. The I 2 - pictures like I 1 - pictures do not 
employ temporal prediction within the same layer, 
however, they differ as they employ motion compen- 
sated bidirectional prediction instead of unidirectional 
prediction from the base layer. 

Fig. G12e shows such picture structures for the 
base and the enhancement layers; the corresponding 
structures are applied to the base and the enhance- 
ment layer encoders. The base layer employs M=3 
structure with two B- pictures between the previous 
and the next reference pictures. The enhancement 
layer simply consists of I 2 - pictures (no temporal pre- 
diction within the same layer) but uses the immediate 
previous and the immediate next decoded base layer 
pictures as reference for motion-compensated pre- 
diction. 

The picture arrangement of Fig. G12e can be 
used for migration to progressive, 960 line, 60 Hz vid- 
eo. A general block diagram of codecs employing this 
picture arrangement is shown in Fig. C4 and Fig. C6. 
In Fig. C4, the base layer uses progressive 960 line, 
30 Hz video, obtained by selecting only the even num- 
bered frames from a 60 Hz source. The enhancement 
layer uses progressive, 960 line, 30 Hz video, ob- 
tained by selecting the odd numbered frames from 
that 60Hz source. The enhancement layer pictures 
thus occur at intermediate temporal instants of the 
base layer pictures. In Fig. C6, the base layer and en- 
hancement layers both use interlaced 960 line, ob- 
tained by either fixed or adaptive progressive-to-two 
interlace conversion from 60 Hz progressive source. 
The enhancement layer pictures occur at same tem- 
poral frame instants as the base layer pictures 
though their field order in each frame is complemen- 
tary to that in base layer. For notational convenience, 
base layer pictures are labeled as even numbered 
frames and enhancement layer pictures as odd num- 
bered frames. Each base layer decoded picture Is 
used for prediction of the next picture in the enhance- 
ment layer. The enhancement layer consists of I 2 - pic- 
tures which do not use any temporal prediction within 
this layer but employ bidirectional motion compensat- 
ed prediction with respect to base layer. 

We can now also extend the concept of P 1 - pic- 
tures introduced above for enhancement layer to 
what we call Pictures. The P 2 - pictures like P 1 - pic- 



tures employ temporal prediction within the same lay- 
er. However they differ because they use motion 
compensated bidirectional prediction instead of uni- 
directional prediction from the base layer. 

5 pjg. G12g shows such picture structures for the 

base and the enhancement layers; the corresponding 
structures are applied to the base and the enhance- 
ment layer encoders. The base layer employs M=3 
structure with two B- pictures between the previous 

10 and the next reference pictures. The enhancement 
layer uses M=1 structure as well as bidirectional mo- 
tion-compensated prediction with immediate previ- 
ous and immediate following base layer pictures as 
reference. The first picture in enhancement layerthus 

is is an I 2 - picture and is followed by remaining P 2 - pic- 
tures. 

The picture arrangement of Fig. G12g can be 
used for migration to progressive format 960 line, 60 
Hz frame rate video and the like. A general block di- 
20 agram of codecs employing this picture arrangement 
is shown in Fig. C4 and Fig. C6. In Fig. C4, the base 
layer uses progressive, 960 line, 30 Hz, obtained by 
selecting only the even numbered frames from 60 Hz 
source, the enhancement layer uses progressive 960 
25 line, 30 Hz, obtained by selecting the odd numbered 
frames; the enhancement layer pictures thus occur at 
intermediate temporal instants of the base layer pic- 
tures. In Fig. C6, the base layer and enhancement 
layers both use interlaced format, 960 lines, obtained 
30 by either fixed or adaptive progressive-to-two inter- 
lace conversion from 60 Hz progressive source. The 
enhancement layer pictures occur at same temporal 
frame instants as the base layer pictures though their 
field order in each frame is complementary to that In 
35 base layer. For notational convenience, base layer 
pictures are labeled as even numbered frames and 
enhancement layer pictures as odd numbered 
frames. Each base layer decoded picture Is used for 
prediction of the next picture in the enhancement lay- 
40 er. The enhancement layer consists of M=1 with I 2 - 
and P 2 - pictures. The P 2 - pictures employ motion 
compensated prediction with respect to the imme- 
diately previous odd-numbered decoded picture in 
the enhancement layer as well as motion compensat- 
45 ed predictions from the immediately previous and Im- 
mediately next even-numbered decoded pictures in 
the base layer. 

Fig. G12h shows picture structures for the base 
and the enhancement layers; the corresponding 
so structures are applied to the base and the enhance- 
ment layer encoders. The base layer employs M=3 
structure with two B- pictures between the previous 
and the next reference pictures. The enhancement 
layer uses M=3 structure as well as unidirectional mo- 
ss tion-compensated prediction with Immediate previ- 
ous base layer picture as reference. Thus, for every 
I-, P- and B- picture in the base layer, there exists a 
corresponding I 1 -, P 1 - and B 1 - picture in the enhance- 
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ment layer. 

The picture arrangement of Fig. G12h can be 
used for migration to progressive 960 line, 60 Hz. 
General block diagram of codecs employing this pic- 
ture arrangement is shown in Fig. C4 and Fig. C6. In s 
Fig. C4, the base layer uses progressive 960 line, 30 
Hz, obtained by selecting only the even numbered 
frames from 60 Hz source, the enhancement layer 
uses progressive 960 line, 30 Hz, obtained by select- 
ing the odd numbered frames; the enhancement layer 1 c 
pictures thus occur at intermediate temporal instants 
of the base layer pictures. In Fig. C6 f the base layer 
and enhancement layers both use interlaced 960 line, 
obtained by either fixed or adaptive progressrve-to- 
two interlace conversion from 60 Hz progressive 15 
source, the enhancement layer pictures occur at 
same temporal frame instants as the base layer pic- 
tures though theirfield order in each frame is comple- 
mentary to that in base layer. For notational conve- 
nience, base layer pictures are labeled as even num- 20 
bered frames and enhancement layer pictures as odd 
numbered frames. Each base layer decoded picture 
is used for prediction of the next picture in the en- 
hancement layer. The enhancement layer consists of 
M=3 structure with first picture being I 1 - and remain- 25 
ing pictures are Pi- or B 1 -pictures. The M- pictures 
employ motion compensated prediction from base 
layer. The P 1 - pictures benefit from two predictions, 
motion compensated prediction with respect to the 
previous (odd numbered) decoded picture in the 30 
same layer as well as motion compensated prediction 
from the immediately previous (even numbered) de- 
coded picture in the base layer. The B 1 - pictures ben- 
efit from three predictions, bidirectional motion com- 
pensated prediction with reference frames in the en- 35 
hancement layer as well as motion compensated pre- 
diction with respect to immediately previous decoded 
frame of the base layer. 

In Fig. C4, the base layer uses progressive 960 
line, 30 Hz, obtained by selecting only the even num- 40 
bered frames from 60 Hz source, the enhancement 
layer uses progressive 960 line, 30 Hz, obtained by 
selecting the odd numbered frames; the enhance- 
ment layer pictures thus occur at intermediate tempo- 
ral Instants of the base layer pictures. In Fig. C6, the as 
base layer and enhancement layers both use inter- 
laced 960 line, obtained by either fixed or adaptive 
progressive-to-two interlace conversion from 60 Hz 
progressive source. The enhancement layer pictures 
occur at same temporal frame instants as the base so 
layer pictures though their field order in each frame 
is complementary to that in base layer. For notational 
convenience, base layer pictures are labeled as even 
numbered frames and enhancement layer pictures as 
odd numbered frames, 55 

Fig. G12i shows additional examples of picture 
structures for the base and the enhancement layers. 
The corresponding structures are applied to the base 



and the enhancement layer encoders as in the other 
cases. The base layer employs M=3 structure with 
two B- pictures between the previous and the next 
reference pictures. The enhancement layer simply 
consists of I- pictures with no temporal prediction 
within the same layer but uses interlaced to progres- 
sive interpolated base layer pictures for prediction. 
The base layer pictures occur at half the rate of en- 
hancement layer pictures but interpolated base layer 
pictures occur at the same rate as enhancement layer 
pictures. 

The picture arrangement of Fig. G12i can be used 
for migration to progressive 960 line, 60 Hz. General 
block diagram of a codec employing this picture ar- 
rangement is shown in Fig. C6. The base layer uses 
interlaced 960 line, obtained by down converting the 
60 Hz source from progressive to interlaced format. 
Each base layer decoded picture is used for predic- 
tion of two corresponding pictures in the enhance- 
ment layer. The enhancement layer consists of I- pic- 
tures only, predicted with respect to upconverted to 
progressive format, decoded interlaced frames from 
the base layer. 

Fig. G12j shows yet additional picture structures 
for the base and the enhancement layers. The corre- 
sponding structures are also applied to the base and 
the enhancement layer encoders. The base layer em- 
ploys M=3 structure with two B- pictures between the 
previous and the next reference pictures. The en- 
hancement layer uses M=1 structure with first picture 
an I- picture with prediction from interlaced to pro- 
gressive interpolated base layer picture, followed by 
P- pictures that not only use prediction from base lay- 
er used by I- picture but also motion compensated 
prediction within the same layer. The base layer pic- 
tures occur at half the rate of enhancement layer pic- 
tures but interpolated base layer pictures occur at the 
same rate as enhancement layer pictures. 

The picture arrangement of Fig. G12j can be used 
for migration to progressive 960 line, 60 Hz. A general 
block diagram of a codec employing this picture ar- 
rangement is shown in Fig. C6. The base layer uses 
interlaced 960 line video, obtained by converting the 
60 Hz source from progressive to interlaced format. 
Each base layer decoded picture is used for predic- 
tion of two corresponding pictures in the enhance- 
ment layer. The enhancement layer consists of M=1 
structure with first picture, an I- picture, foifowed by 
all P- pictures. Both these picture types are predicted 
with respect to upconverted to progressive format, 
decoded interlaced frames from the base layer. The 
P- pictures however also utilize motion compensated 
prediction with respect to the immediate previous de- 
coded picture within the enhancement layer. 
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Claims 

1 . A multi-layer video encoder, comprising: 

an input for receiving progressive format 
video signals representing a first resolution level 
at a predetermined first frame rate; 

a base layer encoding means responsive 
to the input for producing encoded video signals 
in a predetermined format, at a predetermined re- 
solution level, and at a predetermined frame rate, 
at least one of the format, resolution level, and 
frame rate of the encoded video signals being dif- 
ferent from the format, resolution level, and 
frame rate of the video signals received by the in- 
put; 

an enhancement layer encoding means 
responsive to the input for producing encoded 
video signals in a predetermined format at a pre- 
determined resolution level and at a predeter- 
mined frame rate, at least one of the format, re- 
solution level, and frame rate of the encoded vid- 
eo signals produced by the enhancement layer 
encoding means being different from the format, 
resolution level, and frame rate of the encoded 
video signals produced by the base layer encod- 
ing means; and 

an output channel having a predetermined 
bandwidth shared by the encoded video signals 
produced by the base layer encoding means and 
the enhancement layer encoding means. 

2. A decoder for receiving multi-layer encoded video 
signals from an output channel in an encoder as 
in claim 1 for producing an unencoded video sig- 
nal having one of a plurality of predetermined 
quality levels in response to the received video 
signal. 

3. A multi-layer video encoder, comprising: 

an input for receiving progressive format 
video signals representing a first resolution level 
at a predetermined first frame rate; 

a base layer encoding means responsive 
to the input for producing progressive format en- 
coded video signals representing a second reso- 
lution level less than the first resolution level at a 
predetermined second frame rate less than the 
first frame rate; 

an enhancement layer encoding means 
responsive to the Input for producing progressive 
format encoded video signals at the first resolu- 
tion level and at the first frame rate; and 

an output channel having a predetermined 
bandwidth for carrying the encoded video signals 
produced by the base layer encoding means and 
the enhancement layer encoding means. 

4. The multi-layer video encoder of claim 3, further 



comprising: 

a means for causing the encoded video 
signals from the base layer encoding means and 
the enhancement layer coding means to adap- 
5 tively share the bandwidth of the output channel 

in response to a predetermined parameter. 

5. A multi-layer video encoder, comprising: 

an input for receiving progressive format 
10 video signals representing a first resolution level 

at a predetermined first frame rate; 

a means for demultiplexing the video sig- 
nals received on the input into a first demulti- 
plexed video signal at a second frame rate less 
15 than the first frame rate and a second demulti- 

plexed video signal at a third frame rate less than 
the first frame rate; 

a base layer encoding means responsive 
to the first demultiplexed video signal for produc- 
20 ing a progressive format encoded video signal 

having the second frame rate; 

an enhancement layer encoding means 
responsive to the second demultiplexed video 
signal for producing a progressive format encod- 
25 ed video signal having the third frame rate; and 

an output channel having a predetermined 
bandwidth for carrying the encoded video signals 
produced by the base encoding means and the 
enhancement encoding means. 

30 

6. The multi-layer video encoder of claim 5, further 
comprising: 

a means for causing the encoded video 
signals produced by the base layer encoding 
35 means and the enhancement layer coding means 

to adaptively share the bandwidth of the output 
channel in response to a predetermined parame- 
ter. 

40 7. A multi-level video encoder, comprising: 

an input for receiving progressive format 
video signals having a predetermined first reso- 
lution level and a predetermined first frame rate; 
a means for producing an interlaced for- 
45 mat video signal in response to the progressive 

format video signals; 

a base layer encoder for producing inter- 
laced format encoded video signals in response 
to the means for producing an interlaced format 
so video signal; 

an enhancement layer encoder for produc- 
ing progressive format encoded video signals in 
response to the video signals received by the In- 
put; and 

55 an output channel having a predetermined 

bandwidth for carrying the encoded video signals 
produced by the base layer encoder and the en- 
hancement layer encoder. 

21 
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8. The multi-layer video encoder of claim 7, further 
comprising: 

a means for causing the encoded video 
signals produced by the base layer encoding 
means and the enhancement layer coding means s 
to adaptively share the bandwidth of the output 
channel in response to a predetermined parame- 
ter. 

9. A multi-level video encoder, comprising: 10 

an input for receiving progressive format 
video signals having a predetermined first reso- 
lution level and a predetermined first frame rate; 

a means for converting the video signals 
received by the input into a first interlaced format is 
video signal and a second interlaced format video 
signal; 

a base layer encoder for producing first in- 
terlaced format encoded video signals in re- 
sponse to the first interlaced format video signal; 20 

an en hancement layer encoder for produc- 
ing second interlaced format encoded video sig- 
nals in response to the second interlaced format 
video signals received by the input; and 

an output channel having a predetermined 25 
bandwidth for carrying the encoded video signals 
produced by the base layer encoder and the en- 
hancement layer encoder. 
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